Alibaba Introduces New Video Generation Tool
Tora uses the Diffusion Transformer (DiT) architecture, the same framework used by OpenAI's Sora, a text-to-video model launched in February.
Alibaba Group Holding, the Chinese e-commerce giant, is developing Tora, a video-generating tool based on OpenAI's Sora. Detailed in a paper released by five Alibaba researchers, Tora uses the Diffusion Transformer (DiT) architecture, the same framework used by OpenAI's Sora, a text-to-video model launched in February.[1]
Tora is the first trajectory-oriented DiT framework for video generation, ensuring precise movement along specified paths while mimicking real-world dynamics. The researchers adapted OpenSora's workflow to transform raw videos into high-quality video-text pairs and used an optical flow estimator for trajectory extraction.
Tora can generate videos guided by trajectories, images, text, or a combination of the three. It is not disclosed when Tora would be available to the public.
Previously, Alibaba introduced Emote Portrait Alive (EMO) in February, an AI model that creates animated avatar videos from a single still image and audio sample.
In February, OpenAI, the company behind ChatGPT, introduced its first artificial intelligence (AI)-powered text-to-video generation model, Sora.
Using OpenAI's GPT-3 language model, Sora turns text descriptions into video content, helping content creators, filmmakers, and storytellers with a new way to create visuals.
Sora can create detailed scenes with multiple actors, various motion styles, and precise backgrounds.