Researchers from the University of Cambridge have developed a new AI system capable of generating highly realistic and synchronized video from a single audio input, such as a speech recording. The system, detailed in a paper published in Science Advances, uses a novel neural network architecture to model the complex relationship between vocal acoustics and …
Researchers from the University of Cambridge have developed a new AI system capable of generating highly realistic and synchronized video from a single audio input, such as a speech recording. The system, detailed in a paper published in Science Advances, uses a novel neural network architecture to model the complex relationship between vocal acoustics and facial movements. It can produce a talking head video that matches the speaker’s lip movements, expressions, and even subtle head motions directly from the audio waveform, without requiring existing video of the speaker. The technology has potential applications in film dubbing, virtual avatars, and assistive communication, but the researchers emphasize the serious ethical concerns regarding its potential for creating deepfakes and spreading misinformation. They have called for the development of robust detection methods and public education alongside the technological advancement. Read the full article at https://www.sciencedaily.com/releases/2023/10/231012105522.htm.
Join the Club
Like this story? You’ll love our Bi-Weekly Newsletter



