Alibaba has unveiled Wan2.2-S2V, an open-source speech-to-video model designed to generate digital human videos. The technology enables users to convert portrait photos into film-quality avatars capable of speaking, singing, and performing, broadening the possibilities for professional content creation.
Expanding video creation capabilities
Part of the Wan2.2 video generation series, Wan2.2-S2V allows creators to animate videos using a single image and an audio clip. It supports multiple framing options including portrait, bust, and full-body perspectives, and can dynamically generate character actions and environmental details based on prompts.
The model is powered by advanced audio-driven animation technology that delivers natural and expressive performances, from dialogue to musical pieces. It also supports scenes featuring multiple characters and a wide range of avatars, including cartoon, animal, and stylised designs.
To meet varied production needs, the tool provides flexible output resolutions of 480P and 720P. This makes it suitable for both professional presentations and social media content while ensuring quality visuals for different creative contexts.
Combining innovation and efficiency
Wan2.2-S2V improves upon traditional talking-head animation by merging text-guided global motion control with audio-driven fine-grained local movements. This combination allows for expressive and lifelike performances across complex scenarios.
A notable advancement lies in its frame processing approach. By compressing historical frames of any length into a single latent representation, the model reduces computational demands and ensures stability in long-video generation, addressing a common challenge for extended animated productions.
Alibaba’s research team also built a large-scale audio-visual dataset tailored to film and television scenarios to train the model. Using multi-resolution training, it supports video creation in diverse formats, from short-form vertical content to conventional horizontal film and television outputs.
Commitment to open-source community
The Wan2.2-S2V model is available for download on Hugging Face, GitHub, and Alibaba Cloud’s ModelScope. Alibaba has been steadily contributing to the open-source ecosystem, previously releasing Wan2.1 models in February 2025 and Wan2.2 models in July. Together, the Wan series has recorded over 6.9 million downloads across Hugging Face and ModelScope.
Alibaba said the release reflects its ongoing efforts to support professional creators with advanced AI tools while contributing to the wider developer community.