Sesame’s new AI voice model features uncanny imperfections, and it’s willing to act like an angry boss.
In late 2013, the Spike Jonze film Her imagined a future where people would form emotional connections with AI voice assistants. Nearly 12 years later, that fictional premise has veered closer to reality with the release of a new conversational voice model from AI startup Sesame that has left many users both fascinated and unnerved.
It feels reminiscent of the way narrators used to do books on tape. Modern ones are better imho, but all the pausing and intonation definitely seems "professional" more than conversational. Still extremely good.
It sounds a bit off. But if you're not looking for it, you won't find it. That, I believe, is enough to fool most everyone, which is arguably a bad thing.
For reference, this is what Maya reminds me of, Merle Dandridge, VA for Half Life 2's Alyx Vance.
I've skipped to some slower commentary, just so that you can kinda see what a human and AI can sound like with similar pacing while reflecting on a question:
Or go try the demo. It IS eerily uncanny. It's not at the point where it would fool you for long, but it's close enough to get caught up in it from time to time.
Wow, Jesus. Maya has the same conversational style that Merle Dandridge has in the Half Life 2 commentary tracks (in commentary mode).
That is, even for AI, eerily good. Honestly, I'm not sure if I could always tell them apart if I was fed a bunch of voice clips blind, and asked which ones were human, and which were AI.