News
Apr 22, 2026
News
Startups
Artificial Intelligence
Asia
NewDecoded
3 min read

Image by ShengShu
ShengShu Technology has successfully closed a Series A+ funding round exceeding RMB 600 million to accelerate its multimodal AI research. The investment was co-led by Zhongguancun Science City and LINK-X CAPITAL, with participation from strategic partners such as Wondershare and Visual China Group. These funds are earmarked for scaling the Vidu generative platform and driving further innovation in video foundation models on a global scale.
The company recently launched Vidu Q3, which currently ranks as the top video model in China and second globally according to benchmark data from Artificial Analysis. This model represents a breakthrough in digital storytelling, featuring the industry's first native synchronized audio-visual generation. Creators can now produce 16-second, high-definition clips with precise lip-sync and cinematic shot transitions in a single pass, addressing long-standing consistency issues in AI video.
In a bid to democratize high-end production, ShengShu open-sourced its TurboDiffusion framework in late 2025. This technology accelerates video generation by up to 200 times without sacrificing visual quality. Using a single consumer-grade GPU, the system can generate a five-second high-definition video in under two seconds. This leap in efficiency has led to widespread adoption by major tech firms like Tencent and ByteDance.
ShengShu reported a tenfold increase in both user base and revenue over the past year, reflecting strong market demand. The Vidu ecosystem now serves enterprise clients in over 200 countries, including leaders like Samsung, L'Oréal, and Amazon. The platform is widely utilized across diverse sectors such as animation, gaming, and digital marketing to streamline content production workflows.
Looking ahead, the company aims to bridge the gap between digital content and physical reality. Founder and Chief Scientist Jun Zhu noted that these multimodal frameworks have the potential to evolve into true world models that understand the underlying structures of reality. By processing temporal video data, ShengShu intends to support end-to-end machine decision-making and integrate AI intelligence deeper into physical environments and robotics.
This funding signifies a pivotal shift in the generative AI landscape from experimental tools toward high-efficiency industrial production. By solving the multi-entity consistency problem and introducing native audio-sync, ShengShu is positioning video generation as a viable replacement for traditional rendering pipelines. The move to open-source the TurboDiffusion framework suggests a strategic attempt to set a global standard for AI video infrastructure, effectively challenging established players by prioritizing speed and accessibility. As these models evolve into world models, the industry moves closer to a future where AI understands and interacts with the physical world through visual reasoning.
Related Articles