Google DeepMind, the AI research division, recently launched V2A, a new model capable of generating audio from videos.

Video generation models like Sora, Dream Machine, Veo, and Kling are rapidly advancing, enabling users to create videos from text prompts. However, most of these systems produce silent videos. Google DeepMind is addressing this issue with a new large language model designed to generate soundtracks and dialogue for videos.

In a recent blog post, the tech giant’s AI research lab introduced V2A (Video-to-Audio), an innovative AI model that ‘combines video pixels with natural language text prompts to generate rich soundscapes for the on-screen action.’

Compatible with Veo, a text-to-video model unveiled at Google I/O 2024, V2A can add dramatic music, realistic sound effects, and dialogue that matches the video’s tone. Google states that this new large language model is also effective with ‘traditional footage’ like silent films and archival material.

The new V2A model can generate an ‘unlimited number of soundtracks’ for any video and includes optional ‘positive prompt’ and ‘negative prompt’ features to customize the output. It also watermarks the generated audio with SynthID technology.

DeepMind’s V2A technology uses a diffusion model trained on a mix of sounds, dialogue transcripts, and videos, taking sound descriptions as input. Due to limited video training data, the output may sometimes be distorted. Google has stated that V2A will not be publicly released anytime soon to prevent misuse.

One thought on ““Google DeepMind Unveils V2A: AI for Video Soundtracks and Dialogue””

Leave a Reply

Your email address will not be published. Required fields are marked *