Sunday, 30 November 2025
26.1 C
Singapore
13.9 C
Thailand
19.9 C
Indonesia
27.7 C
Philippines

Red Hat launches Red Hat AI 3 to bring distributed AI inference to production

Red Hat AI 3 enables distributed AI inference at scale, improving collaboration and accelerating enterprise adoption of AI.

Red Hat has introduced Red Hat AI 3, the latest version of its enterprise AI platform that aims to make large-scale artificial intelligence easier to deploy and manage in production. Combining Red Hat AI Inference Server, Red Hat Enterprise Linux AI (RHEL AI), and Red Hat OpenShift AI, the new platform is designed to simplify high-performance AI inference, improve collaboration between teams, and accelerate the move from experimentation to real-world applications.

Addressing enterprise AI deployment challenges

Many organisations struggle to scale AI projects beyond proof-of-concept due to concerns around data privacy, cost, and managing a wide range of models. Research by the Massachusetts Institute of Technology’s NANDA project shows that about 95% of companies have yet to see measurable financial returns from an estimated US$40 billion in enterprise AI spending.

Red Hat AI 3 directly targets these barriers by offering a consistent, unified platform that helps CIOs and IT leaders maximise returns on accelerated computing technologies. It allows AI workloads to be scaled and distributed across hybrid and multi-vendor environments, while improving cross-team collaboration on advanced AI use cases such as intelligent agents. Built on open standards, the platform supports any model on any hardware accelerator, whether deployed in data centres, public clouds, sovereign AI environments, or at the edge.

Joe Fernandes, vice president and general manager of Red Hat’s AI Business Unit, said, “As enterprises scale AI from experimentation to production, they face a new wave of complexity, cost and control challenges. With Red Hat AI 3, we are providing an enterprise-grade, open source platform that minimises these hurdles. By bringing new capabilities like distributed inference with llm-d and a foundation for agentic AI, we are enabling IT teams to more confidently operationalise next-generation AI, on their own terms, across any infrastructure.”

Enabling scalable, cost-efficient inference

As AI projects evolve from model training to the inference stage — where systems generate outputs and insights — efficiency, scalability, and cost management become critical. Red Hat AI 3 builds on the success of open source projects like vLLM and llm-d to deliver production-grade serving of large language models (LLMs).

With the release of Red Hat OpenShift AI 3.0, llm-d becomes generally available, transforming how LLMs run natively on Kubernetes. It enables intelligent, distributed inference and leverages the orchestration capabilities of Kubernetes alongside the performance of vLLM. Technologies such as the Kubernetes Gateway API Inference Extension, NVIDIA’s NIXL low-latency data transfer library, and the DeepEP Mixture of Experts (MoE) communication library further enhance efficiency and responsiveness.

This distributed approach lowers costs, improves response times, and enables consistent performance, even with highly variable workloads or very large models such as MoE systems. Ujval Kapasi, vice president of Engineering AI Frameworks at NVIDIA, said, “Scalable, high-performance inference is key to the next wave of generative and agentic AI. With built-in support for accelerated inference with open source NVIDIA Dynamo and NIXL technologies, Red Hat AI 3 provides a unified platform that empowers teams to move swiftly from experimentation to running advanced AI workloads and agents at scale.”

Driving collaboration and building the foundation for agentic AI

Red Hat AI 3 provides a unified, collaborative environment for both platform and AI engineers, streamlining workflows from prototype to production. A new Model as a Service (MaaS) feature enables IT teams to act as their own MaaS providers, centrally serving shared models and offering on-demand access to developers and applications. This approach improves cost control and supports use cases that cannot rely on public AI services due to privacy or data restrictions.

The AI Hub offers a central location to explore, deploy, and manage foundational AI assets, with a curated catalogue of validated and optimised models, a model lifecycle registry, and tools for deployment and monitoring. Meanwhile, Gen AI Studio gives AI engineers a hands-on space to experiment with models, prototype applications, and fine-tune prompts for use cases such as chat and retrieval-augmented generation (RAG).

Red Hat AI 3 also ships with a curated selection of open source models, including OpenAI’s gpt-oss, DeepSeek-R1, Whisper for speech-to-text, and Voxtral Mini for voice-enabled agents.

Looking ahead, Red Hat is positioning its platform as a key enabler of the emerging era of agentic AI — autonomous AI systems capable of managing complex workflows. Red Hat OpenShift AI 3.0 introduces a Unified API layer built on Llama Stack to align with industry standards, including OpenAI-compatible protocols. It is also among the early adopters of the Model Context Protocol (MCP), an emerging standard that improves how AI models connect with external tools.

A new modular toolkit built on Red Hat’s InstructLab further supports model customisation. It includes Python libraries, synthetic data generation tools, and an evaluation hub, allowing developers to fine-tune large language models using proprietary data with greater precision.

Mariano Greco, chief executive officer of ARSAT, said, “By building our agentic AI platform on Red Hat OpenShift AI, we went from identifying the need to live production in just 45 days. Red Hat OpenShift AI has not only helped us improve our service and reduce the time engineers spend on support issues, but also freed them up to focus on innovation and new developments.”

Red Hat’s launch is supported by key industry partners, including AMD, which provides the hardware foundation through EPYC processors, Instinct GPUs, and the ROCm software stack. Dan McNamara, senior vice president and general manager of Server and Enterprise AI at AMD, said, “Together, we’ve integrated the efficiency of AMD EPYC processors, the scalability of AMD Instinct GPUs, and the openness of the AMD ROCm software stack to help enterprises move beyond experimentation and operationalise next-generation AI — turning performance and scalability into real business impact across on-prem, cloud, and edge environments.”

Hot this week

Google warns staff of rapid scaling demands to keep pace with AI growth

Google tells staff it must double AI capacity every six months as leaders warn of rapid growth, rising demand, and tough years ahead.

OpenAI introduces a new shopping assistant in ChatGPT

OpenAI launches a new ChatGPT shopping assistant that helps users compare products, find deals, and search for images ahead of Black Friday.

Marsham Edge: Converting AI hype into measurable performance gains for megaprojects

Marsham Edge CEO Muriel Demarcus explains how AI can transform megaprojects into data-driven infrastructure that delivers on time and on budget.

Qualcomm introduces Snapdragon 8 Gen 5 as streamlined alternative to Elite chipset

Qualcomm launches the Snapdragon 8 Gen 5 chipset, offering strong performance, AI features, and expected availability in devices within weeks.

SMRT upgrades Bishan Depot with automation to double train overhaul capacity

SMRT upgrades Bishan Depot with automation to double overhaul capacity and enhance safety, efficiency, and workforce sustainability.

DeepSeek launches open AI model achieving gold-level scores at the Maths Olympiad

DeepSeek launches Math-V2, the first open AI model to achieve gold-level scores at the International Mathematical Olympiad.

AI browsers vulnerable to covert hacks using simple URL fragments, experts warn

Experts warn AI browsers can be hacked with hidden URL fragments, posing risks invisible to traditional security measures.

Slop Evader filters out AI content to restore pre-ChatGPT internet

Slop Evader filters AI-generated content online, restoring pre-ChatGPT search results for a more human web.

Lara Croft becomes gaming’s best-selling heroine amid new Tomb Raider rumours

Lara Croft becomes gaming’s best-selling heroine as new Tomb Raider rumours fuel excitement.

Related Articles

Popular Categories