In an exciting development, Stable Diffusion has announced that its generative art is now animatable. Stability AI, the developer behind this innovation, has officially released a new product called Stable Video Diffusion into a research preview. This release enables users to transform a single image into a video, marking a significant advancement in the field of generative AI video modeling.
The Stable Video Diffusion tool consists of two image-to-video models, each capable of generating videos ranging from 14 to 25 frames in length. These models function at speeds of 3 to 30 frames per second, offering a high-quality output at a resolution of 576 x 1024. Additionally, the tool is capable of multi-view synthesis from a single frame, with fine-tuning on multi-view datasets. During external evaluation, these models were found to surpass leading closed models in user preference studies, positioning Stable Video Diffusion as a groundbreaking addition to the realm of generative AI.
To showcase the capabilities of Stable Video Diffusion, Stability AI has provided a research preview, allowing researchers and developers to explore the potential applications in various sectors. While the company has emphasized that Stable Video Diffusion is currently available exclusively for research purposes, the public can sign up to be placed on a waitlist for access to an upcoming web interface. This interface is expected to feature a text-to-video application, which will demonstrate the tool’s versatility in industries such as advertising, education, and entertainment.
As seen in the video above, the results generated by Stable Video Diffusion appear to be of high quality when compared to other generative systems. However, the tool does have some limitations. It can generate relatively short videos (less than 4 seconds) and lacks perfect photorealism. Additionally, it is unable to facilitate camera motion, except for slow pans, and it does not offer text control. Furthermore, it may struggle to generate legible text and faces, highlighting areas where further improvement is needed.
The development of Stable Video Diffusion involved training the tool on a large dataset of millions of videos, followed by fine-tuning on a smaller set. While Stability AI has not disclosed the specific origin of the dataset, it has emphasized that the video content used for training was publicly available for research purposes. Notably, this focus on data sources has come to the forefront, as Stability AI was recently sued by Getty Images for allegedly scraping its image archives.
Despite the promising potential of generative AI in creating content, it also raises concerns regarding deepfakes, copyright violations, and other forms of misuse. Stability AI, in particular, has faced challenges in commercializing its products and has been the subject of financial scrutiny. The company’s efforts to enter the video-generating space have been fraught with difficulties, with a notable resignation and ongoing legal disputes over copyright use. This underscores the complexities and ethical considerations involved in the development and application of generative AI technology.
In conclusion, Stable Video Diffusion represents a significant milestone in the evolution of generative AI, offering a glimpse into the future of content creation and manipulation. While its capabilities and limitations are evident, the tool has opened doors to new possibilities in various industries. As researchers and developers await access to the full potential of Stable Video Diffusion, it remains critical to address the ethical, legal, and commercial challenges associated with the advancement of generative AI.