You may not think of it this way, but you probably hear AI voices all the time. When you talk to Alexa or Siri, it’s a model trained in human speech to be able to say almost anything. Has a celebrity ever given you directions on Waze? AI. And every time you watch TikTok and hear that slightly cheerful voice saying the captions out loud, that’s AI through and through. Heck, Apple’s AI will even read you a romance novel before you go to bed.
AI systems are getting better at converting text into credible speech in almost any language and almost any voice. And in this episode of The Vergecast, the first of our three-part miniseries on AI, that voice is mine. We trained a bunch of different AI robots with the sound of my voice: sometimes we read scripts full of nonsense sentences, sometimes we uploaded hours of existing audio from old files. Vergecast episodes, sometimes a little of each, to see how well and how quickly we could make a passable AI copy of my voice.
It was… pretty wild. Here is the episode:
And if you want a quick comparison of the different tools, first, here’s the reference speech we used from the great Dwight Schrute:
We transcribed that text and fed it into every AI generator we tested. Here’s how Podcastle interpreted it voiced by AI David Pierce:
This is what Descript did with the same:
And the new Personal Voice feature in iOS 17:
And finally, ElevenLabs, easily the most realistic and impressive of the tools we tested:
Ultimately, I don’t think any of the AI voices are going to replace me. But they are improving very rapidly and raise enormous possibilities and enormous questions. What does it mean that I can create such a good replica and that over time they will be better and easier? What responsibilities do I have as the person who did it? What responsibilities do other people have?
Obviously, we’re having a lot of discussions about AI music right now, as artists’ voices are being used to train models that can create pretty convincing songs with almost anyone’s voice. That will generate a decade of interesting court cases and ethical debates, but those same things will occur only for you and me. How do we use these tools? How do we talk about them? Is it even possible to get good, useful, democratizing things from them without all the deepfakes and problems? We have a lot to solve and there is no time to waste. Because the technology is really good right now and it’s improving really fast.