Skip to main content

Amazon Web Services deploys Cerebras inference chips

Amazon Web Services said it reached a multiyear agreement to integrate Artificial Intelligence (AI) processors from Cerebras Systems into its data centers, a move the release described as signaling growing confidence in the semiconductor startup. Financial terms of the arrangement were not disclosed.

The companies framed the arrangement within a broader shift in compute strategies as deployed AI services increased demand for inference efficiency. The release described how providers were diversifying silicon options because traditional GPUs remained the default for training but could be suboptimal for low‑latency, high‑throughput inference workloads.

The technical outline in the release identified Cerebras’s Wafer‑Scale Engine (WSE) and its wafer‑scale architecture as the core component being added to AWS infrastructure, and it cited claims that the WSE could execute the decode phase of Generative AI (GenAI) processing at speeds up to 25 times those of conventional Graphics Processing Unit (GPU) solutions. The release also said AWS would integrate Cerebras technology alongside its Trainium processors.

The scope of the collaboration was described as a multiyear deployment of Cerebras WSE to accelerate inference workloads and as support for both aggregated and disaggregated configurations, with the startup expecting customers to require access to both modes and workload routing between them. The release noted Cerebras had an existing compute partnership with OpenAI involving up to 750 megawatts of capacity.

“But if you want fast tokens, if speed matters to you, if you’re doing coding or agentic work, not only are we the absolute fastest, but we intend to set the bar. We’re in this to win it,” said Andrew Feldman. “Our job is to push the speed and lower the price,” said Nafea Bshara, noting that AWS will continue to offer cost-optimized Trainium-only options alongside high-performance Cerebras-Trainium configurations.

The companies described plans to position the joint offering as a premium cloud inference solution and to support both aggregated and disaggregated configurations.