Red Hat launches Red Hat AI 3 to bring distributed AI inference to production

Red Hat has introduced Red Hat AI 3, the latest version of its enterprise AI platform that aims to make large-scale artificial intelligence easier to deploy and manage in production. Combining Red Hat AI Inference Server, Red Hat Enterprise Linux AI (RHEL AI), and Red Hat OpenShift AI, the new platform is designed to simplify high-performance AI inference, improve collaboration between teams, and accelerate the move from experimentation to real-world applications.

Addressing enterprise AI deployment challenges

Many organisations struggle to scale AI projects beyond proof-of-concept due to concerns around data privacy, cost, and managing a wide range of models. Research by the Massachusetts Institute of Technology’s NANDA project shows that about 95% of companies have yet to see measurable financial returns from an estimated US$40 billion in enterprise AI spending.

Red Hat AI 3 directly targets these barriers by offering a consistent, unified platform that helps CIOs and IT leaders maximise returns on accelerated computing technologies. It allows AI workloads to be scaled and distributed across hybrid and multi-vendor environments, while improving cross-team collaboration on advanced AI use cases such as intelligent agents. Built on open standards, the platform supports any model on any hardware accelerator, whether deployed in data centres, public clouds, sovereign AI environments, or at the edge.

Joe Fernandes, vice president and general manager of Red Hat’s AI Business Unit, said, “As enterprises scale AI from experimentation to production, they face a new wave of complexity, cost and control challenges. With Red Hat AI 3, we are providing an enterprise-grade, open source platform that minimises these hurdles. By bringing new capabilities like distributed inference with llm-d and a foundation for agentic AI, we are enabling IT teams to more confidently operationalise next-generation AI, on their own terms, across any infrastructure.”

Enabling scalable, cost-efficient inference

As AI projects evolve from model training to the inference stage — where systems generate outputs and insights — efficiency, scalability, and cost management become critical. Red Hat AI 3 builds on the success of open source projects like vLLM and llm-d to deliver production-grade serving of large language models (LLMs).

With the release of Red Hat OpenShift AI 3.0, llm-d becomes generally available, transforming how LLMs run natively on Kubernetes. It enables intelligent, distributed inference and leverages the orchestration capabilities of Kubernetes alongside the performance of vLLM. Technologies such as the Kubernetes Gateway API Inference Extension, NVIDIA’s NIXL low-latency data transfer library, and the DeepEP Mixture of Experts (MoE) communication library further enhance efficiency and responsiveness.

This distributed approach lowers costs, improves response times, and enables consistent performance, even with highly variable workloads or very large models such as MoE systems. Ujval Kapasi, vice president of Engineering AI Frameworks at NVIDIA, said, “Scalable, high-performance inference is key to the next wave of generative and agentic AI. With built-in support for accelerated inference with open source NVIDIA Dynamo and NIXL technologies, Red Hat AI 3 provides a unified platform that empowers teams to move swiftly from experimentation to running advanced AI workloads and agents at scale.”

Driving collaboration and building the foundation for agentic AI

Red Hat AI 3 provides a unified, collaborative environment for both platform and AI engineers, streamlining workflows from prototype to production. A new Model as a Service (MaaS) feature enables IT teams to act as their own MaaS providers, centrally serving shared models and offering on-demand access to developers and applications. This approach improves cost control and supports use cases that cannot rely on public AI services due to privacy or data restrictions.

The AI Hub offers a central location to explore, deploy, and manage foundational AI assets, with a curated catalogue of validated and optimised models, a model lifecycle registry, and tools for deployment and monitoring. Meanwhile, Gen AI Studio gives AI engineers a hands-on space to experiment with models, prototype applications, and fine-tune prompts for use cases such as chat and retrieval-augmented generation (RAG).

Red Hat AI 3 also ships with a curated selection of open source models, including OpenAI’s gpt-oss, DeepSeek-R1, Whisper for speech-to-text, and Voxtral Mini for voice-enabled agents.

Looking ahead, Red Hat is positioning its platform as a key enabler of the emerging era of agentic AI — autonomous AI systems capable of managing complex workflows. Red Hat OpenShift AI 3.0 introduces a Unified API layer built on Llama Stack to align with industry standards, including OpenAI-compatible protocols. It is also among the early adopters of the Model Context Protocol (MCP), an emerging standard that improves how AI models connect with external tools.

A new modular toolkit built on Red Hat’s InstructLab further supports model customisation. It includes Python libraries, synthetic data generation tools, and an evaluation hub, allowing developers to fine-tune large language models using proprietary data with greater precision.

Mariano Greco, chief executive officer of ARSAT, said, “By building our agentic AI platform on Red Hat OpenShift AI, we went from identifying the need to live production in just 45 days. Red Hat OpenShift AI has not only helped us improve our service and reduce the time engineers spend on support issues, but also freed them up to focus on innovation and new developments.”

Red Hat’s launch is supported by key industry partners, including AMD, which provides the hardware foundation through EPYC processors, Instinct GPUs, and the ROCm software stack. Dan McNamara, senior vice president and general manager of Server and Enterprise AI at AMD, said, “Together, we’ve integrated the efficiency of AMD EPYC processors, the scalability of AMD Instinct GPUs, and the openness of the AMD ROCm software stack to help enterprises move beyond experimentation and operationalise next-generation AI — turning performance and scalability into real business impact across on-prem, cloud, and edge environments.”

Hot topics

Going elsewhere?

Cybersecurity

Marketing

Southeast Asia

Geek

Hot topics

Going elsewhere?

Cybersecurity

Marketing

Southeast Asia

Geek

Red Hat launches Red Hat AI 3 to bring distributed AI inference to production

Addressing enterprise AI deployment challenges

Enabling scalable, cost-efficient inference

Driving collaboration and building the foundation for agentic AI

Topics

Related Articles

Categories

Other Headlines

Follow Us