11mo ago

Microsoft’s VASA-1 can deepfake a person with one photo and one audio track

arstechnica.com

Technology @lemmy.world

return2ozma @lemmy.world

12mo ago

Microsoft’s VASA-1 can deepfake a person with one photo and one audio track

arstechnica.com /information-technology/2024/04/microsofts-vasa-1-can-deepfake-a-person-with-one-photo-and-one-audio-track/

Technology @lemmy.ml

pelespirit @sh.itjust.works

12mo ago

Microsoft’s VASA-1 can deepfake a person with one photo and one audio track

arstechnica.com /information-technology/2024/04/microsofts-vasa-1-can-deepfake-a-person-with-one-photo-and-one-audio-track/

54 comments

What can possibly go wrong?

Why would you develop this technology I simply don’t understand. All involved should be sent to jail. What the fuck.
- They worded the headline that way to scare you into that reaction. They're only interested in telling you about the negative uses because that drives engagement.
  
  I understand AI evangelists - which you may or may not be idk - look down on us Luddites who have the gall to ask questions, but you seriously can’t see any potential issue with this technology without some sort of restrictions in place?
  You can’t see why people are a little hesitant in an era where massive international corporations are endlessly scraping anything and everything on the Internet to dump into LLM’s et al to use against us to make an extra dollar?
  You can’t see why people are worried about governments and otherwise bad actors having access to this technology at scale?
  I don’t think these people should be locked up or all AI usage banned. But there is definitely a middle ground between absolute prohibition and no restrictions at all.
  
  Honestly that's a good rule of thumb for all headlines at this point.
  
  Good point good point
- They mentioned one potential use that I thought has value and that I hadn't considered. For video conferencing, this could transmit data without sending video and greatly reduce the amount of bandwidth needed by rendering people's faces locally. I don't think that outweighs the massive harms this technology will unleash. But at least there was some use that would be legit and beneficial.
  I'm someone who has a moral compass and I don't like that scammers will abuse this shit so I hate it. But there's no keeping it locked away. It's here to stay. I hate the future / now.
  
  Wouldn't you then have to run the AI locally on a machine (which probably draws a lot of power and memory) or use it via cloud (which depends on bandwidth just like a video call). I don't really see where this technology could actually be useful. Sure, if it is only a minor computation just like if you take a picture/video with any modern smartphone. But computing an entire face and voice seems much more complicated than that and not really feasible for the usual home device.
  
  Also I would argue sending the actual video of what is happening in front of the camera is kind of the entire point of having a video call. I don’t see any utility in having a simulated face to face interaction where neither of you is even looking at an actual image of the other person.
- You can’t simply not develop a technology. Progress is going to move forward. If they don’t do it, somebody else is going to figure out how. The tools are out there. The math works. Better researchers to do it now and scare us into finding solutions than criminals to develop it first.
- Other than the obvious malicious uses of this technology, it could be great for multimedia, great for creative control for cast, great for virtual meetings to always look “your best” (as determined by each individual, e.g. clean-cut pristine, and/or preferred gender, and/or favorite anime, etc.). There are also use cases to hear letters spoken by a lost loved one, or replace the Three Stooges with politicians. Tons of “safe” use cases that I am looking forward to.
  
  This is a really positive take. I would love to create such an AI of myself in my likeness so that if one day I come to pass before my wife, she could enjoy having that comfort. I imagine it speaking like: while I’m not your husband, here’s what I think he would’ve said.
  Deep faking myself so I don’t have to use my camera in meetings? I would pay for that feature.
  
  I'm not convinced any of these uses are actually beneficial. They mostly range from creepy to pointless.
- Because bags of money. And MS is a hyper toxic entity that’s been siphoning the data of every Windows user for decades now. That company is basically IBM during WW2.
- If something is possible, and this simply indeed is, someone is going to develop it regardless of how we feel about it, so it's important for non-malicious actors to make people aware of the potential negative impacts so we can start to develop ways to handle them before actively malicious actors start deploying it.
  Critical businesses and governments need to know that identity verification via video and voice is much less trustworthy than it used to be, and so if you're currently doing that, you need to mitigate these risks. There are tools, namely public-private key cryptography, that can be used to verify identity in a much tighter way, and we're probably going to need to start implementing them in more places.
- Would be great for me and others who have trouble with body language. I could deepfake a version of myself with neurotypical body language and offload the effort of "acting normal" to the AI for interviews and video calls. Genuinely I'm super pumped for this.
  
  Now that is interesting, I've never heard this consideration before.
- They're also releasing a detector, for what it's worth.
  Yeah, this one seems like it will have more negative applications than positive. Usually you'll have a lot more content from someone you want to copy for non-deceptive reasons. It's inevitable all video will be easily fake-able one day soon, but why hasten it?

The actual research page is so awkward. The TLDR at the top goes:
single portrait photo + speech audio = hyper-realistic talking face video
Then a little lower comes the big red warning:
We are exploring visual affective skill generation for virtual, interactive characters, NOT impersonating any person in the real world.
No siree! Big "not what it looks like" vibes.

Someone help me out please. Who was the 90s sci-fi author who predicted actors would go away and all movies would be made using cgi /ai? She had characters in the book, watching movies starring Humphrey Bogart and John Wayne, as detectives solving crimes (and so on). She also predicted "ractors", people who act in front of a camera, so a computer can use their motion and expressions to animate a character on screen in real time.
My feeble brain, I swear... In any case, thanks to her, knew this day was coming. Gonna be a wild ride though.
- According to Le Chat,
  The author you're thinking of is Neal Stephenson, and the book is "Snow Crash" published in 1992. In the book, he coined the term "ractors" for actors who perform in front of motion-capture cameras to create lifelike animations. He also predicted the use of CGI and AI in filmmaking to create movies with long-dead actors.
  I haven’t read it and the Wikipedia article doesn’t seem to mention virtual actors, so it could be wrong. At least it didn’t hallucinate a fake book.
  
  It just cited the wrong Neal Stephenson book, so not way off.
  
  https://archive.is/ZTU90
  The Atlantic | Neal Stephenson’s Most Stunning Prediction
  The sci-fi legend coined the term metaverse. But he was most prescient about our AI age. By Matteo Wong
  
  Oh snap, thanks - I was mixing up The Diamond Age with another book, yes. Ractors are from Stephenson, but I also had another author's books in my head. See? Feeble mind. There's still another woman author I need to track down and re-read here.
  
  I asked Perplexity with “What is the scifi book from the 90s that had “ractors,” where a person would act in front of a camera and a computer would animate a CG model?”and got (what other commenters are saying) is the correct answer:
  The science fiction book from the 90s that featured "ractors," where a person would act in front of a camera and a computer would animate a CG model, is not directly mentioned in the provided search results. However, based on the description of "ractors" and the context of computer animation and CG models, it seems you might be referring to "The Diamond Age" by Neal Stephenson, published in 1995. In this novel, the term "ractor" (short for "interactive actor") is used to describe performers who participate in interactive theater through virtual reality environments, which could align with the concept of acting in front of a camera to animate a CG model. However, since this specific detail is not found in the search results, this answer is based on existing knowledge outside of the provided sources.

The eyes still have uncanny valley vibes, but that's because I'm looking for it. If I wasn't watching demo videos about generated video, I might not have noticed.
- And that's the problem. The average person isn't looking for it, and will absolutely not see it. As long as it's good enough, that's all that matters. A plausible enough video of Joe Biden talking about rounding up Christians into internment camps that gets shared on Facebook, or something like that which panders to right-wing bigotry, is enough to get people going. Even real images and videos that are miscaptioned are enough, and even when a link is there that disproves the caption.
  People seriously underestimate just how horrifying the possibilities are with this shit. And as high stakes as this election cycle is, and the state of politics in this country, the tendency for people to latch on to anything that affirms their preexisting ideals creates a fucking minefield
  
  This is an education problem as much as -- if not moreso than -- a tech problem. Before the GOP gutted critical thinking wherever they held a majority and two generations were able to grow up under those circumstances, a video of any current president rounding up Christians would have been roundly rejected as either satirical or disinformation by the vast majority of the population, owing to the absurdity of the idea.
  Once we got to the point of a not-insignificant minority of the population believing that the true power in the United States lies in the basement of a pizza shop with no basement ...

Trained on YouTube clips
It could have been worse. Imagine trained by Tik Tok clips.

Sigh, not this article again. No, they can't "deepfake a person with one photo". They can create a bad uncanny-valley 75% accurate version of one.
- a bad uncanny-valley 75% accurate version of one
  Actually a perfect description of what a deepfake is.
  
  I've seen far more convincing deepfakes, to the point I couldn't tell until I was told. I've experimented with this myself. After a bit of trial and error, almost anyone can easily create shockingly convincing deepfakes. One interesting method is using 3D rendered characters with deepfake faces.

I think this has an effect most people don't think of: Media will just lose it's value as a trusted source for information. We'll just lose the ability of broadcasting media as anything could be faked. Humanity is back to "word of mouth", I guess.

Well, just watch " The masked scammer " documentary and you'll see how this can ( and definitely will ) go wrong. For summary, there's this article on Wikipedia: Gilbert Chikli.

Omg stop what are you guys thinking
- Money.

Yeah Microsoft isn't releasing this until we can use it responsible.
we'll never be able to guarantee that. There will always be people abusing this.
Though right now it's in the hands of Microsoft and likely requires a shit tonne of hardware to run (I'd imagine a collection of specialized servers), this tech WILL come out eventually, and eventually, everyone will be able to run it.
I give it 5-10 years tops before anyone can just do this with anyone. Want to make a movie of trump or Hilary fucking a donkey? Done. Want to make a video of your 5 year old daughter in a gangbang? Done. The future is very bleak.
I'm honestly unsure if the internet was a good idea and I'm even less sure if humanity was a good idea.

54 comments