Tech Updates

Enterprise

Artificial Intelligence

Global

Zyphra Trains First Large-Scale AI Model Entirely on AMD Hardware

AI startup Zyphra has achieved a milestone by training ZAYA1, the first large-scale Mixture-of-Experts model built entirely on AMD's Instinct MI300X GPUs, challenging NVIDIA's dominance in AI training infrastructure.

AI startup Zyphra has achieved a milestone by training ZAYA1, the first large-scale Mixture-of-Experts model built entirely on AMD's Instinct MI300X GPUs, challenging NVIDIA's dominance in AI training infrastructure.

AI startup Zyphra has achieved a milestone by training ZAYA1, the first large-scale Mixture-of-Experts model built entirely on AMD's Instinct MI300X GPUs, challenging NVIDIA's dominance in AI training infrastructure.

NewDecoded

Published Nov 25, 2025

Nov 25, 2025

3 min read


Breaking NVIDIA's Monopoly

Zyphra announced on November 24 that it has successfully trained ZAYA1, a sophisticated Mixture-of-Experts (MoE) foundation model, using exclusively AMD Instinct MI300X GPUs, Pensando networking, and the ROCm open software stack. The achievement marks the first time a large-scale MoE model has been developed without relying on NVIDIA's dominant GPU infrastructure. According to a technical report published by Zyphra, the model delivers competitive or superior performance to leading open models across reasoning, mathematics, and coding benchmarks.

Outperforming the Competition

ZAYA1-base, with 8.3 billion total parameters and 760 million active parameters, matches or exceeds the performance of Meta's Llama-3-8B, Alibaba's Qwen3-4B, Google's Gemma3-12B, and Meta's OLMoE. The sparse architecture enables the model to achieve these results while consuming significantly less compute during inference compared to dense models. The model was trained on 12 trillion tokens across a 128-node cluster, with each node equipped with eight MI300X GPUs.

Memory Advantage Drives Efficiency

The MI300X's 192GB of high-bandwidth memory proved critical to the training success, allowing Zyphra to avoid complex expert and tensor sharding techniques that typically complicate MoE training. This simplified architecture reduced training complexity and improved throughput across the model stack. Zyphra also reported more than 10x faster model save times using AMD-optimized distributed I/O, enhancing training reliability and reducing checkpoint overhead.

Strategic Collaboration

The project involved close collaboration between Zyphra, AMD, and IBM Cloud, which provided high-performance fabric and storage architecture for the training cluster. "Efficiency has always been a core guiding principle at Zyphra," said Krithik Puthalath, CEO of Zyphra, in AMD's announcement. "Our results highlight the power of co-designing model architectures with silicon and systems."

Technical Innovations

Zyphra implemented proprietary architectural innovations including Compressed Contextual Attention (CCA), which uses convolutions within the attention mechanism to perform full attention operations in compressed latent space. The company also refined the expert router component and employed lighter-touch residual scaling for stability in deeper layers. These optimizations, combined with custom kernels and specialized parallelism schemes, enabled efficient large-scale training on the AMD platform.

Decoded Take

Decoded Take

Decoded Take

This announcement represents a pivotal moment in the AI hardware landscape, directly challenging NVIDIA's near-monopoly in AI training infrastructure. By successfully training a complex MoE model entirely on AMD hardware and achieving competitive performance against models from tech giants like Meta, Google, and Alibaba, Zyphra has proven that viable alternatives exist for organizations building AI systems. The 10x improvement in model save times and simplified training architecture translate to real cost savings and operational advantages, which could accelerate AMD's adoption in enterprise AI deployments. This development arrives as organizations globally seek to diversify their GPU suppliers amid supply constraints and cost pressures, potentially fragmenting a market that has been overwhelmingly dominated by a single vendor. The success also validates AMD's multi-year investment in building a complete AI stack encompassing hardware, networking, and open-source software, positioning the company as a credible competitor in the lucrative AI accelerator market.

Share this article

Related Articles

Related Articles

Related Articles