Skip to main content

ABI Research Forecasts AI Inference to Surpass Training by 2033

ABI Research forecasted a shift in AI workload demand toward inference, with enterprise adoption and production-scale compute needs becoming central to cloud AI infrastructure planning.

The firm projected that AI inference workloads would grow at a 42% compound annual growth rate and surpass training workloads by 2033, reaching more than 46 gigawatts of capacity consumption by 2035.

ABI Research projected training demand would also expand to 36 gigawatts by 2035, while fine-tuning would surpass foundation model training by 2032 and reach 21 gigawatts by 2035. Foundation model training was projected to reach nearly 13 gigawatts by 2035.

Within inference workloads, code generation was forecast to scale to roughly 24 gigawatts by 2035 and account for more than half of total inference capacity consumption. Text generation was projected to reach about 7 gigawatts by 2035, and audio generation was forecast to grow fastest at a 42% CAGR. ABI Research expected inference-focused neocloud providers to nearly catch hyperscalers in total inference capacity consumption by 2035, reaching 15 gigawatts and 16 gigawatts, respectively.

“Inference is the commercial engine of the AI market, and its market activity is accelerating at an incredible pace with better model capabilities and the computational demands of agentic systems,” said Larbi Belkhit, Senior Analyst at ABI Research. “For the last several years, the cloud demand and build-out centered predominantly on training frontier models, but the next wave of competition will be won by providers that can deliver inference at scale with the right balance of performance, latency, cost, and compute utilization.”

“These forecasts show that AI infrastructure strategy is moving into a new phase, where success depends less on headline model size and more on operationalizing AI across real-world workloads,” Belkhit said. “Cloud providers, enterprises, and the wider AI supply chain are preparing for a far more heterogeneous compute landscape shaped by long-running autonomous agentic systems, multi-modal workloads, and scalable inference demand.”