Mistral AI has introduced the Mistral 3 family of open-source multilingual and multimodal models, developed to run efficiently across Nvidia’s supercomputing and edge platforms. The launch marks a closer partnership between the two companies as they work to advance large-scale AI for enterprise use.
Expanding model capabilities across cloud and edge
The new Mistral 3 range includes models designed for both frontier-level performance and compact edge deployment. The flagship model, Mistral Large 3, uses a mixture-of-experts architecture that activates only the most relevant parts of the network for each token. This approach is intended to improve efficiency while maintaining accuracy, allowing enterprises to scale AI systems without excessive compute demands.
Mistral Large 3 features 41 billion active parameters, 675 billion total parameters and a 256,000-token context window. According to the company, these specifications enable high scalability and adaptability across demanding enterprise workloads. The models are available across cloud environments, data centres and edge devices from 2 December.
Mistral AI describes this release as part of an emerging phase of distributed intelligence, where models can operate flexibly across a wide range of hardware while bridging the gap between research innovation and practical deployment.
Performance gains through Nvidia-optimised architecture
The partnership leverages Nvidia’s GB200 NVL72 systems alongside the Mistral 3 model architecture to improve performance across large AI workloads. By tapping into Nvidia NVLink’s coherent memory domain and expert parallelism optimisations, the mixture-of-experts design is able to use hardware resources more efficiently at scale.
These gains are further supported by low-precision NVFP4 and Nvidia Dynamo disaggregated inference optimisations. Together, these improvements aim to raise training and inference throughput without affecting model accuracy.
On the GB200 NVL72 platform, Mistral Large 3 delivered a tenfold performance increase compared with the earlier Nvidia H200 generation. The improvement is expected to help businesses reduce per-token costs, enhance energy efficiency and improve overall user experience.
Bringing AI to the edge with compact models
Alongside the flagship model, Mistral AI has released nine small language models under the Ministral 3 suite. These compact models are designed to run on Nvidia’s edge platforms, including RTX PCs and laptops, the Nvidia Spark platform and Jetson devices.
Nvidia is also working with popular open-source frameworks such as Llama.cpp and Ollama to optimise performance across its GPUs. Developers can already test the Ministral 3 models through these tools, enabling fast and efficient execution on local hardware.
The entire Mistral 3 family is openly available, giving researchers and developers freedom to customise and build on the models. Nvidia’s NeMo tools, including Data Designer, Customizer, Guardrails and the NeMo Agent Toolkit, offer additional pathways for enterprises to refine models for specific applications and accelerate deployment from early prototypes to production systems.
To support consistent performance from cloud to edge, Nvidia has also optimised several inference frameworks — TensorRT-LLM, SGLang and vLLM — for the new model family. The models are accessible on major open-source platforms and cloud providers, with deployment as Nvidia NIM microservices expected soon.



