Shengshu said Vidu is able to generate a four-second clip in 30 seconds, according to a statement. That makes it one of the fastest on the market, as other similar tools usually take longer to generate a video of similar length.
Shengshu exemplifies how China’s prestigious Tsinghua University has emerged as a main force backing the country’s AI ambitions. Behind Vidu is the firm’s self-developed architecture called U-ViT, first detailed in a September 2022 research paper authored by a team led by Zhu Jun, Shengshu AI’s chief scientist, who is also a computer science professor at Tsinghua University.
Another Tsinghua author of the paper, Bao Fan, currently serves as Shengshu’s chief technology officer. Shengshu’s chief executive Tang Jiayu was a graduate of Tsinghua’s department of computer science and technology.
In an interview in April, Tang told local media that it would be easier for Chinese firms to catch up with Sora than with GPT-4, OpenAI’s advanced large language model that is the technology behind ChatGPT. He did not elaborate.
In addition to text and image-to-video, Vidu has added a function that lays the foundation for commercialisation of the technology due to its potential use in the animation and content industries, Zhang Xudong, product director at Shengshu AI, said in an interview with the Post.
The new character-to-video function lets users upload an image of a real person or an animated character, and use simple text prompts to make it come alive.
“In the future we hope [users] could upload multiple characters and [describe] scenes, and have them act in those scenes, similar to how a film is being produced,” Zhang said. “Our goal is to integrate AI tools with traditional sectors.”
Shengshu, which has raised tens of millions of US dollars, counts Qiming Venture Partners, search giant Baidu, Alibaba Group Holding’s fintech affiliate Ant Group, and the Beijing AI Industry Investment Fund as its backers. Alibaba owns the Post.