Friday, 28 November 2025
28.5 C
Singapore
26.1 C
Thailand
27.8 C
Indonesia
28.4 C
Philippines

AMD powers Zyphra’s large-scale AI training milestone

Zyphra trains its ZAYA1 foundation model entirely on AMD hardware, marking a major step for large-scale AI development.

AMD has announced that Zyphra has completed training ZAYA1, a new Mixture-of-Experts (MoE) foundation model built entirely on AMD’s GPU and networking platform. The work marks the first time a large-scale MoE model has been trained using AMD Instinct MI300X GPUs together with AMD Pensando networking and the ROCm open software stack.

Zyphra detailed the achievement in a technical report published on 28 November. According to the company, ZAYA1 delivers competitive or superior results across reasoning, mathematics and coding benchmarks when compared with leading open models. The performance demonstrates that AMD’s platform can support production-scale AI workloads that typically rely on rival GPU systems.

Emad Barsoum, corporate vice president of AI and engineering in AMD’s Artificial Intelligence Group, said the milestone highlights how the company’s technology can support modern AI development. “AMD leadership in accelerated computing is empowering innovators like Zyphra to push the boundaries of what’s possible in AI,” he said. “This milestone showcases the power and flexibility of AMD Instinct GPUs and Pensando networking for training complex, large-scale models.”

Zyphra’s chief executive Krithik Puthalath said the model reflects the company’s broader focus on efficiency. “Efficiency has always been a core guiding principle at Zyphra. It shapes how we design model architectures, develop algorithms for training and inference, and choose the hardware with the best price-performance to deliver frontier intelligence to our customers,” he said. He added that the organisation is “thrilled to be the first company to demonstrate large-scale training on an AMD platform” and intends to continue working with AMD and IBM as it develops future multimodal foundation models.

Focus on memory capacity and training throughput

Zyphra reported that the MI300X GPU’s 192 GB of high-bandwidth memory played a central role in enabling the model’s training efficiency. The additional capacity allowed the team to avoid the need for expert or tensor sharding, which can add complexity and slow performance. Zyphra added that it achieved more than ten times faster model save times because of AMD’s optimised distributed I/O, which helped improve reliability during large-scale runs.

ZAYA1-Base contains 8.3 billion total parameters, with 760 million active at any given moment. Despite the lower active parameter count, the model matches or exceeds the performance of several well-known systems, including Qwen3-4B from Alibaba, Gemma3-12B from Google, Meta’s Llama-3-8B and OLMoE.

Joint work with AMD and IBM on large-scale infrastructure

The development builds on earlier collaboration between Zyphra, AMD and IBM. Together, the companies designed and deployed a large-scale training cluster that combines AMD Instinct GPUs with IBM Cloud’s high-performance fabric and storage architecture. The system, first announced earlier in the quarter, provided the infrastructure required to train ZAYA1 at scale.

The companies said the engineering partnership enabled Zyphra to run complex pretraining workloads more efficiently, supported by AMD’s hardware platform and IBM’s cloud-native performance architecture.

The ZAYA1 report, together with accompanying updates from both companies, outlines the training approach, model design and AMD technologies used during development. AMD said the milestone reflects growing momentum around its GPU platform as an alternative to well-established competitors in large-scale AI training.

Hot this week

Alibaba Cloud supports launch of new AISG language model for Southeast Asia

AI Singapore and Alibaba Cloud release Qwen-SEA-LION-v4, a multilingual Southeast Asia-focused language model built on Qwen3-32B.

Andika Rama returns to claim TGR Asia Esports GT Championship 2025 title

Indonesia’s Andika Rama wins the TGR Asia Esports GT Championship 2025 as his team seals both individual and country titles.

Warner Music ends lawsuit against Suno after reaching new licensing agreement

Warner Music ends its lawsuit against Suno after securing a licensing deal that gives artists opt-in control over AI-generated music.

Singapore consumers show growing interest in AI shopping companions

Research shows rising consumer interest in AI shopping agents in Singapore, with strong demand for cost savings and secure automation.

POCO enters premium smartphone segment with new F8 series

POCO launches the F8 Ultra, F8 Pro, and two new tablets as it enters the premium flagship market with new performance and audio features.

Honor launches Magic8 Pro in Singapore with new MagicBook Art 14 and Watch Fit

Honor launches the Magic8 Pro in Singapore with upgraded imaging, AI features and companion devices including the MagicBook Art 14 and Watch Fit.

The forgotten battle royale that ended a studio still deserved more than a one-month run

A look back at Radical Heights, the short-lived battle royale that showed promise but shut down after just one month.

Google limits free Nano Banana Pro image generation due to high demand

Google is reducing free Nano Banana Pro and Gemini 3 Pro usage due to high demand, limiting daily access while paid plans remain unchanged.

Marsham Edge: Converting AI hype into measurable performance gains for megaprojects

Marsham Edge CEO Muriel Demarcus explains how AI can transform megaprojects into data-driven infrastructure that delivers on time and on budget.

Related Articles

Popular Categories