Microsoft

It can generate a talking face video with a portrait photo and speech audio. Key Takeaways Microsoft has introduced VASA, an AI-driven framework to generate lifelike talking faces from images … Read more

Taylor Bell

Taylor Bell

Published on Apr 18, 2024

Microsoft

It can generate a talking face video with a portrait photo and speech audio.

Teams avatars

Key Takeaways

  • Microsoft has introduced VASA, an AI-driven framework to generate lifelike talking faces from images and audio.
  • VASA-1 delivers 512×512 videos at 40fps with realistic facial dynamics and low latency.
  • Microsoft is cautious about releasing VASA due to misuse concerns and authenticity issues.

In Microsoft Teams, you can use the Avatar feature if you’re feeling video fatigue and don’t want to appear in the video while attending a meeting. It simply creates a 3D avatar of yours, which animates based on your audio cues, without a webcam. However, Microsoft has gone a step further and developed a new AI technology that can take the concept of webcam-free video conferencing to the next level.

Microsoft develops VASA to generate talking faces from a static image and an audio clip

“” data-modal-id=”single-image-modal” data-modal-container-id=”single-image-modal-container” data-img-caption=”null”>

Screenshot showing lifelike talking faces

Microsoft Research has introduced a new framework termed VASA that can generate “hyper-realistic” talking faces, with all the lifelike facial behavior if given a single portrait image and a speech audio. Microsoft has also showcased how AI-driven tech can generate quality videos with realistic facial expressions and can be useful in scenarios where real-time engagements are required, such as video conferencing in Microsoft Teams.

Microsoft claims that the first model of VASA, which is dubbed VASA-1, “delivers high video quality with realistic facial and head dynamics but also supports the online generation of 512×512 videos at up to 40 FPS with negligible starting latency”. However, you’ll see those numbers when you’re in the “offline batch processing mode”. In the case of online streaming, it supports up to 40fps, with a latency of 170ms.

Microsoft has no plans to release VASA, yet

While the demo video of how it works looks promising, Microsoft appears to be super cautious when implementing the technology in its services. One major issue holding the software giant back from releasing it is uncertainty over whether it can be used responsibly. The company has admitted that it’ll develop a forgery detection technology to prevent misuse.

There is another important issue Microsoft promises to fix before launching it for public use. The company believes the technology behind generating these talking faces is far from being perfect, as the generated videos aren’t as authentic as naturally captured ones. Time will tell when, or if, all these improvements will come to make VASA a thing we can all benefit from.

Related

Best webcams in 2024

With remote work and learning becoming so common over the past few years, you probably need a good camera. Here are the best webcams you can get.

Partager cet article

Inscrivez-vous à notre newsletter