Milestone Systems has announced a new generative AI plug-in for its XProtect video management software, developed in collaboration with NVIDIA. The tool aims to automate video review and response, enabling faster decision-making and reducing operator fatigue. A beta version will debut at the Smart City Expo World Congress in Barcelona from 4 to 6 November, with general availability expected later this year.
The plug-in uses generative AI to summarise, contextualise, and validate video footage in real time. By automating the review process, it helps security and operations teams focus on critical incidents rather than being overwhelmed by the sheer volume of video data. Early findings suggest the technology could reduce operator alarm fatigue by up to 30 per cent.
Automating incident reports and improving alarm accuracy
The new XProtect plug-in introduces several key features designed to make video management more efficient. Automated incident reporting converts selected video clips into structured summaries, saving time on manual documentation. Event validation capabilities analyse motion events to confirm or dismiss alarms, reducing false positives and enhancing the accuracy of alerts.
Additionally, contextual bookmark summaries use natural language generation to produce concise overviews of bookmarked footage, allowing teams to quickly identify relevant moments without reviewing each video manually. The plug-in integrates directly with the XProtect rule engine and can be deployed on-premises or in the cloud, giving organisations flexibility in how they manage data compliance and infrastructure.
Built on ethical AI and real-world data
The new solution is powered by Milestone’s Hafnia Vision Language Model (VLM), which has been trained on 75,000 hours of ethically sourced real-world video data from Europe and the United States. Data preparation is supported by NVIDIA Cosmos Curator, and the system runs on either cloud or regional data centres powered by NVIDIA technology. It also leverages the NVIDIA Cosmos Reason VLM, positioning it as one of the most advanced and compliant video AI platforms available.
Thomas Jensen, CEO of Milestone Systems, said, “With this new XProtect plug-in, we are making advanced video intelligence accessible to cities, organisations, and operators everywhere who manage traffic systems – helping them unlock new levels of efficiency, safety, and insight. XProtect users will get access to state-of-the-art generative AI capabilities, and our partners will be able to build value on top of those new capabilities now available within XProtect. It truly marks a pivotal step in our mission to transform how the world manages and learns from visual data, responsibly and at scale.”
Cities such as Genoa in Italy and Dubuque in Iowa are among the early adopters, using Milestone’s AI-driven solutions to enhance their traffic management systems.
Opening innovation through VLM-as-a-Service
Milestone’s plans extend beyond the XProtect plug-in. The company is introducing Vision Language Model-as-a-Service (VLMaaS) through APIs, allowing developers, system integrators, and partners to build their own generative AI solutions regardless of the video management platform they use.
Live demonstrations of the plug-in will be featured at the Smart City Expo World Congress, presented in partnership with Vaidio at the Dell booth. The showcase will include an AI model benchmarking tool and real-time incident summarisation.
Milestone will continue highlighting its AI developments at the upcoming Developer Summit in Copenhagen on 10 and 11 November. The event will feature new capabilities of the Hafnia VLM and announce the winners of the Hafnia Hackathon.
Founded in 1998 and headquartered in Copenhagen, Milestone Systems is a global leader in data-driven video technology, serving sectors such as law enforcement, manufacturing, retail, airports, and traffic management. The company employs over 1,500 people worldwide and has been an independent part of the Canon Group since 2014.



