Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Hugging Showcases Demos Based On Open Source Text-To-Video Models, Pinpoints Flaws

Hugging Face, the AI developers’ go-to platform has released AI WebTV, as the latest advancement in automatic video and music synthesis. The Model aims to advocate for open-source accessible text-to-video models like Zeroscope and MusicGen.

The technique excels in replacing backgrounds during camera panning or rotation. Moreover, it gives users creative freedom, granting control over the number of frames in the generation process, resulting in high quality slow-motion effects. The prime video model behind the WebTV is Zeroscope V2, that can be implemented in NodeJS and TypeScript.

The HF model works by taking video shot prompts, which then via a text-to-video model, generate results in a sequence of takes. To enhance the creative process further, a human-authored base theme and idea are fed into a large language model, to generate diverse individual prompts for each video clip.

Prompt: 3D rendered animation showing a group of food characters forming a pyramid, with a banana standing triumphantly on top. In a city with cotton candy clouds and chocolate road, Pixar’s style, CGI, ambient lighting, direct sunlight, rich color scheme, ultra realistic, cinematic, photorealistic.

Talking about the ability of text-to-video models, the HF blog stated, “We’ve seen it with large language models and their ability to synthesize convincing content that mimics human responses, but this takes things to a whole new dimension when applied to video,” said the HF blog authored by Julian Bilcke. 

The video sequences released along with the demo are made short, to show WebTV as a tech demo rather than an actual show with an art direction or programming.

Even though the advancement is being lauded, HF has pointed out a few cases where the model fails. Firstly, it can have issues with movement and direction. For instance, a clip can sometimes be played in reverse. Also, at certain instances the modifier keyword is not taken into account. Furthermore, the model sometimes injects words from the prompt which can appear in the video.

Source: https://huggingface.co/blog/ai-webtv

Similar to HF’s model, last year in September Meta AI released Make-A-Video but the model remains closed source like the majority of services announced the the tech giant. 

Read more: Meta AI Releases A Multimodal Model “CM3leon”  — But Won’t Release It

The post Hugging Showcases Demos Based On Open Source Text-To-Video Models, Pinpoints Flaws appeared first on Analytics India Magazine.



This post first appeared on Analytics India Magazine, please read the originial post: here

Share the post

Hugging Showcases Demos Based On Open Source Text-To-Video Models, Pinpoints Flaws

×

Subscribe to Analytics India Magazine

Get updates delivered right to your inbox!

Thank you for your subscription

×