Skip to main content

AMD supports Zyphra's training of large-scale ZAYA1 AI model

Zyphra developed ZAYA1, completing a large-scale Mixture of Experts (MoE) foundation model trained entirely on AMD hardware and software platforms. This advancement marks the first instance of such a model using AMD Instinct MI300X GPUs, AMD Pensando networking, and ROCm open software. The development aims to address large-scale Artificial Intelligence (AI) training requirements.

The model demonstrates competitive or improved outcomes compared to several open-source models in reasoning, mathematics, and coding benchmarks, reflecting operational capabilities for production-scale AI workloads. ZAYA1 contributes a data point in evaluating AMD hardware's potential for efficient large-scale AI model training.

The technical implementation leveraged AMD Instinct MI300X GPUs with 192 GB of High Bandwidth Memory (HBM), which allowed for simplified training by avoiding expert or tensor sharding. This approach improved throughput and reduced complexity. Additionally, model save times were accelerated by over tenfold through AMD-optimized distributed input/output processes. The ZAYA1-base variant includes 8.3 billion total parameters with 760 million active parameters and compares in performance with models such as Qwen3-4B, Gemma3-12B, Llama-3-8B, and OLMoE.1.

Collaboration with AMD and IBM resulted in the design and deployment of a large-scale training cluster integrating AMD Instinct MI300X GPUs with AMD Pensando networking. This system utilizes IBM Cloud's high-performance fabric and storage architecture and provided the environment for ZAYA1's large-scale pretraining. The jointly engineered platform represents the infrastructure supporting the training process.

Emad Barsoum, corporate vice president of AI and engineering at AMD, said, “AMD leadership in accelerated computing is empowering innovators like Zyphra to push the boundaries of what’s possible in AI.” Zyphra's CEO, Krithik Puthalath, said, “Efficiency has always been a core guiding principle at Zyphra. It shapes how we design model architectures, develop algorithms for training and inference, and choose the hardware with the best price-performance to deliver frontier intelligence to our customers. ZAYA1 reflects this philosophy and we are thrilled to be the first company to demonstrate large-scale training on an AMD platform. Our results highlight the power of co-designing model architectures with silicon and systems, and we’re excited to deepen our collaboration with AMD and IBM as we build the next generation of advanced multimodal foundation models.”

The companies outlined ongoing efforts to build upon this foundation, intending to continue integrating AMD's hardware solutions and software technologies for future AI model developments.