XPENG achieves AI breakthrough in autonomous driving research
XPENG and Peking University unveil FastDriveVLA, an AI framework that cuts autonomous driving compute load by 7.5 times while maintaining accuracy.
XPENG has announced a major research milestone in autonomous driving, with a collaborative paper developed alongside Peking University accepted by the Association for the Advancement of Artificial Intelligence conference in 2026. The research introduces a new visual token pruning framework designed to improve the efficiency of end-to-end autonomous driving systems while maintaining high levels of accuracy in complex driving environments.
The paper, titled “FastDriveVLA: Efficient End-to-End Driving via Plug-and-Play Reconstruction-based Token Pruning”, was selected from a highly competitive field. AAAI 2026 received 23,680 submissions, of which only 4,167 were accepted, representing an acceptance rate of 17.6 percent. Acceptance at the conference is widely regarded as recognition of technical depth and originality within the global artificial intelligence research community.
The collaboration combines XPENG’s applied automotive AI capabilities with academic research from Peking University. It reflects a broader industry shift towards deeper investment in foundational AI research as automakers seek to overcome the scalability, cost, and real-time performance challenges that continue to limit the deployment of advanced autonomous driving systems.
A new approach to visual token pruning for driving AI
At the centre of the research is FastDriveVLA, a visual token pruning framework designed specifically for Vision-Language-Action models used in end-to-end autonomous driving. These models convert images into large numbers of visual tokens, which form the basis for how the system understands its surroundings and generates driving actions. While effective, processing such a high volume of tokens significantly increases computational load onboard vehicles, affecting inference speed and real-time responsiveness.
FastDriveVLA addresses this challenge by enabling the AI system to focus only on essential visual information, in a manner similar to how human drivers prioritise critical foreground elements such as lanes, vehicles, and pedestrians while disregarding non-essential background details. By filtering out irrelevant visual data, the framework reduces computational demands without compromising the quality of driving decisions.
Unlike existing token pruning approaches that rely on text-visual attention mechanisms or token similarity, FastDriveVLA adopts a reconstruction-based method. It introduces an adversarial foreground-background reconstruction strategy that strengthens the model’s ability to identify and preserve high-value tokens. This approach allows the system to retain strong scene understanding while operating more efficiently.
Performance results and implications for L4 autonomy
The framework was evaluated using the nuScenes autonomous driving benchmark, where it achieved state-of-the-art performance across multiple pruning ratios. When the number of visual tokens was reduced from 3,249 to 812, FastDriveVLA delivered a nearly 7.5 times reduction in computational load while maintaining high planning accuracy. These results highlight the framework’s potential to support more efficient deployment of advanced autonomous driving models in real-world vehicles.
This marks the second time in 2026 that XPENG has been recognised at a leading global AI conference. Earlier in the year, the company was the only Chinese automaker invited to speak at the CVPR Workshop on Autonomous Driving, where it shared progress on autonomous driving foundation models. At its Tech Day event in November, XPENG also unveiled its VLA 2.0 architecture, which removes the language translation step from traditional Vision-Language-Action pipelines and enables direct visual-to-action generation.
XPENG stated that these developments reflect its full-stack in-house capabilities, spanning model architecture design, training, distillation, and vehicle deployment. Looking ahead, the company reaffirmed its commitment to achieving Level 4 autonomous driving, with continued investment in large-scale AI models aimed at accelerating the integration of physical AI systems into vehicles and delivering safer, more efficient, and more comfortable intelligent driving experiences globally.
