Skip Navigation

Microsoft’s VASA-1 can deepfake a person with one photo and one audio track

arstechnica.com

Microsoft’s VASA-1 can deepfake a person with one photo and one audio track

54 comments
  • The actual research page is so awkward. The TLDR at the top goes:

    single portrait photo + speech audio = hyper-realistic talking face video

    Then a little lower comes the big red warning:

    We are exploring visual affective skill generation for virtual, interactive characters, NOT impersonating any person in the real world.

    No siree! Big "not what it looks like" vibes.

  • Trained on YouTube clips

    It could have been worse. Imagine trained by Tik Tok clips.

  • Well, just watch " The masked scammer " documentary and you'll see how this can ( and definitely will ) go wrong. For summary, there's this article on Wikipedia: Gilbert Chikli.

54 comments