As companies move from testing out generative AI tools and models into real-world use — also known as inference— they’re having trouble predicting what that use will lead to in terms of cloud costs, according to a new report from analyst firm Canalys .
“Unlike training, which is a one-time investment, inference represents a recurring operational cost, making it a crucial constraint on the path to commercializing AI,” said Canalys senior director Rachel Brindley in a statement. “As AI moves from research to large-scale deployment, companies are increasingly focusing on cost-effectiveness in inference, comparing models, cloud platforms, and hardware architectures such as GPUs versus custom accelerators.”
According to Canalys researcher Yi Zhang, many AI services rely on usage-based pricing models that charge per token or API call; that makes it difficult to predict costs when scaling up usage.
This story originally appeared on Computerworld