Red Hat has announced an expanded collaboration with Amazon Web Services to strengthen how enterprises run generative AI workloads across hybrid cloud environments. The company aims to give IT leaders more choice and efficiency by enabling Red Hat AI to run on AWS Trainium and Inferentia chips.
Rising demand for scalable AI performance
The rapid adoption of generative AI is prompting organisations to reassess how they build and scale their infrastructure. With inference workloads growing quickly, businesses are looking for ways to optimise performance while managing costs. IDC forecasts that by 2027, 40 per cent of organisations will adopt custom silicon such as ARM processors or chips designed specifically for AI and machine learning. Red Hat said this highlights the need for infrastructure that can support high-performance inference with better efficiency.
Joe Fernandes, vice president and general manager of Red Hat’s AI Business Unit, said the company’s work with AWS reflects these shifts in enterprise requirements. “By enabling our enterprise-grade Red Hat AI Inference Server, built on the innovative vLLM framework, with AWS AI chips, we’re empowering organisations to deploy and scale AI workloads with enhanced efficiency and flexibility,” he said. He added that the partnership is rooted in Red Hat’s open source approach, aiming to make generative AI more accessible and cost-effective across hybrid cloud environments.
Integrating Red Hat AI with AWS AI accelerators
A key part of the collaboration is the enablement of the Red Hat AI Inference Server on AWS Trainium and Inferentia chips. Red Hat said this will create a common inference layer capable of running any generative AI model with lower latency and improved price performance. The company expects customers to see up to 30 to 40 per cent better price performance compared with similar GPU-based Amazon EC2 instances.
Colin Brace, vice president at Annapurna Labs, AWS, said the joint effort is designed to support organisations looking for alternatives that deliver strong performance without high operational costs. “Enterprises demand solutions that deliver exceptional performance, cost efficiency, and operational choice for mission-critical AI workloads,” he said. “AWS designed its Trainium and Inferentia chips to make high-performance AI inference and training more accessible and cost-effective. Our collaboration with Red Hat provides customers with a supported path to deploying generative AI at scale.”
Red Hat and AWS have also developed an AWS Neuron operator for Red Hat OpenShift, Red Hat OpenShift AI and Red Hat OpenShift Service on AWS. This gives customers a more seamless route to run AI workloads using AWS accelerators within OpenShift environments. Red Hat has additionally released the amazon.ai Certified Ansible Collection to simplify orchestration of AI services on AWS.
Another part of the collaboration involves upstream community work. Red Hat and AWS are jointly optimising an AWS AI chip plugin contributed to vLLM. Red Hat is the largest commercial contributor to the project, which also underpins llm-d, an open source tool for large-scale inference that is now commercially supported in Red Hat OpenShift AI 3.
Supporting hybrid cloud strategies across industries
Red Hat said the companies’ long-standing partnership is now evolving to meet demand from organisations adopting AI across datacentres, public cloud and edge environments. Many enterprises are building hybrid cloud strategies that rely on consistent performance and efficient operations.
Jean-François Gamache, chief information officer and vice president of Digital Services at CAE, said Red Hat OpenShift Service on AWS has already played a significant role in its digital transformation. “This platform supports our developers in focusing on high-value initiatives, driving product innovation and accelerating AI integration across our solutions,” he said. He added that OpenShift’s flexibility and scalability have helped the organisation improve customer-facing outcomes, from real-time insights to faster response times for user-reported issues.
Industry analysts believe the collaboration addresses an important shift in enterprise AI adoption. Anurag Agrawal, founder and chief global analyst at Techaisle, said organisations are increasingly focused on balancing performance with long-term cost sustainability. “As AI inference costs escalate, enterprises are prioritising efficiency alongside performance,” he said. “This collaboration exemplifies Red Hat’s ‘any model, any hardware’ strategy by combining its open hybrid cloud platform with the distinct economic advantages of AWS Trainium and Inferentia. It empowers CIOs to operationalise generative AI at scale, shifting from cost-intensive experimentation to sustainable, governed production.”



