Salesforce AI Research has introduced a series of innovations aimed at helping enterprises prepare AI agents for real-world use. The company unveiled an advanced simulation platform for testing AI in complex business settings, launched new benchmarking tools to measure agent performance, and enhanced its Data Cloud with smarter data unification capabilities.
Preparing AI agents for real-world scenarios
To improve the way AI agents are trained before deployment, Salesforce has released CRMArena-Pro, an enterprise-grade simulation environment. Building on the earlier CRMArena tool, the new platform tests AI performance in multi-turn, multi-agent tasks such as sales forecasting, customer service triage and configure-price-quote processes. The environment uses synthetic data and mimics real business complexity, including API integrations and safeguards for personal information.
By modelling unpredictable business events, CRMArena-Pro enables companies to evaluate an AI agent’s accuracy, efficiency and resilience before it goes live. The system acts as a digital twin of enterprise operations, allowing safe experimentation while preparing AI to handle edge cases like supply chain disruptions or high-pressure customer service scenarios.
Setting benchmarks for AI readiness
Salesforce also introduced the Agentic Benchmark for CRM, the first assessment designed to test AI agents in the specific contexts that businesses care about. It measures five key metrics — accuracy, cost, speed, trust and safety, and sustainability. This approach gives IT leaders a clearer way to compare models and select those that match their operational needs.
Sustainability is a new metric in the evaluation process, reflecting the environmental impact of AI systems. As models become larger and more resource-intensive, the benchmark helps companies balance computing demands with performance needs.
Complementing this, Salesforce released MCP-Eval and MCP-Universe, two additional benchmarking frameworks. MCP-Eval provides scalable synthetic tests across a wide range of systems, while MCP-Universe offers more challenging, real-world scenarios to identify where AI agents might fail. Together, they allow organisations to diagnose weaknesses and fine-tune their agents for reliable enterprise performance.
Improving data for AI-driven decisions
Recognising that quality data is crucial for AI, Salesforce has added a new capability to its Data Cloud called Account Matching. This system uses large and small language models to automatically merge duplicate and inconsistent records across business units. Instead of manual rule-based clean-ups, AI now matches and unifies accounts — for example, linking “The Example Company, Inc.” with “Example Co.” into a single record.
Early customer results show significant impact. One company using Account Matching unified over one million accounts in its first month, achieving a 95% match rate and cutting average handling time by 30 minutes per case. The tool also reduces manual work by sending only the most complex data cases to humans, speeding up sales cycles and improving efficiency.
Moving toward the agentic enterprise
These updates show Salesforce’s push to equip companies with practical tools to adopt AI responsibly and effectively. By combining simulation, reliable benchmarking and clean data, the company aims to help businesses build “agentic enterprises” — organisations where AI works alongside humans to handle everyday tasks, reduce operational friction and support growth.