Closing dataset gaps in vision AI, Milestone adds synthetic data and training services
Milestone expands Hafnia with synthetic data and training services to improve vision AI performance in unusual real-world scenarios.
Milestone Systems is extending its Hafnia platform with synthetic data capabilities and a new Training-as-a-Service offering, targeting a persistent limitation in computer vision, the inability to train models for rare and unpredictable scenarios.
Table Of Content
Announced at NVIDIA GTC, the update positions Hafnia as a more complete environment for vision AI development, connecting curated datasets, model training, and deployment into a single workflow. The expansion centres on improving how models generalise beyond historical data, particularly in Smart City environments where edge cases often define system performance.
Addressing gaps in real-world datasets
Most vision AI systems rely heavily on historical data, which tends to underrepresent unusual conditions such as rare weather patterns, atypical traffic behaviour, or region-specific variables. This creates blind spots in real-world deployment, especially in urban infrastructure where variability is constant.
Hafnia integrates synthetic data into its existing video library to fill these gaps. Using NVIDIA Cosmos Transfer, developers can generate scenarios that are difficult or unsafe to capture, while also balancing underrepresented object classes and regional variations. The approach keeps real-world data as the foundation, with synthetic augmentation extending coverage rather than replacing it.
“Together with NVIDIA, we are taking Hafnia to the next level by combining trusted real-world data with synthetic augmentation,” said Edward Mauser, Director of Hafnia at Milestone Systems. “This enables developers to train AI models that are not only accurate in known situations, but also resilient in the unexpected.”
Simplifying training workflows
The introduction of Training-as-a-Service shifts Hafnia from a data platform into a more integrated development environment. Instead of assembling fragmented pipelines, developers can access compliant, traceable datasets directly and apply them within their own training workflows.
This reduces the operational overhead associated with sourcing, preparing, and managing training data. It also allows teams to customise datasets and fine-tune models for specific use cases, while maintaining regulatory compliance through fully traceable data sources.
Milestone claims that this streamlined approach can accelerate the development of analytics solutions by up to 30 times, as developers spend less time on data preparation and more on model performance.
Expanding vision models for smart cities
Alongside data and training capabilities, Hafnia now includes VLM-as-a-Service, offering Visual Language Models built on NVIDIA Cosmos Reason models. These models are tailored for Smart City applications, where interpreting complex visual and contextual data is critical.
A new EU-optimised traffic model is already in use with selected cities, with additional models planned to expand coverage across different urban scenarios. The hosted approach removes the need for repeated retraining and infrastructure scaling, lowering the barrier to deploying generative AI within computer vision systems.
Performance improvements reported by Milestone include gains in flow and direction analysis, visual feature detection, and alert verification accuracy, indicating incremental gains in operational reliability rather than broad model redesign.
Infrastructure and data control considerations
Hafnia’s architecture is built on a multi-cloud setup using AWS, Nebius, and other providers, aligning compute requirements with different stages of the model lifecycle. The synthetic data pipeline, powered by NVIDIA Cosmos components, is being deployed on Nebius.
This approach also addresses data sovereignty requirements, allowing organisations to control where sensitive data is stored and processed. For public sector and Smart City deployments, where compliance and jurisdictional control are critical, this becomes a core design consideration rather than an infrastructure detail.
The platform brings together data sourcing, augmentation, training, and deployment under a single framework, with NVIDIA’s Physical AI Data Factory Blueprint acting as the underlying reference architecture.





