With Spotify’s AI DJ, the company trained an AI on the voice of a real person – that of Cultural Partnerships head and podcast host, Xavier “X” Jernigan. Now the streamer seems to be able to turn that same technology into advertising. According to statements by The Ringer founder Bill Simmons, the streaming service is developing AI technology that can use a podcast host’s voice to create host-readable ads — without the host actually having to read and record the ad text. to take.
Simmons made the statements on a recent episode from The Bill Simmons Podcast, saying, “There will be a way to use my voice for the ads. You obviously have to approve the vote, but it opens up, from an advertising standpoint, all these different great possibilities for you.”
He said these ads could open up new possibilities for podcasters because they could target ads geographically — such as tickets to a local event in the listener’s city — or even create ads in different languages, with the host’s permission.
His comments were first reported by Semaphore.
The Ringer was acquired by Spotify in 2020, but it wasn’t clear whether Simmons was authorized to speak about the streamer’s plans in this area, as he started by saying: “I don’t think Spotify will get mad at me for this…” before sharing the information.
Reached for comment, Spotify would not directly confirm or deny development of the feature.
“We’re always working to improve the Spotify experience and test new offerings that benefit creators, advertisers and users,” a Spotify spokesperson told TechCrunch. “The AI landscape is evolving rapidly and Spotify, which has a long history of innovation, is exploring a wide range of applications, including our hugely popular AI DJ feature. There is a 500% increase in the number of daily podcast episodes discussing AI in the past month, including the conversation between Derek Thompson and Bill Simmons. Ads provide an interesting canvas for future research, but we have nothing to announce at this time.”
The subtext of this comment indicates that Simmons’ statements may have been somewhat premature.
That said, Spotify has already hinted that the AI DJ in today’s app won’t be the only AI voice users will encounter in the future. When Jernigan was recently asked about Spotify’s plans to work with other voice models in the future, he teased, “Stay tuned.”
The streamer has also quietly invested in AI development and research, with a team of a few hundred now working in areas like personalization and machine learning. In addition, the team uses the OpenAI model and explores the possibilities of large language models, generative voices, and more.
Spotify’s ability to create AI voices specifically leverages IP from Spotify’s 2022 acquisition of Sonantic in conjunction with OpenAI technology. The company could choose to use its own in-house AI technology in the future, the company recently told us.
To create AI DJ, Spotify had Jernigan enter a studio to produce high-quality recordings, including ones where he read lines with different cadences and emotions. He kept his natural pauses and breaths in the shots and was sure to use the language he already says – like “tunes” or “bangers” rather than just “songs”. All this is then fed into the AI model, which then creates the AI voice.
The company has explained to describe the process in more detail or say how long it took to convert Jernigan’s recordings into an AI DJ. But given its potential interest in turning its podcast hosts into AI speech models, it needs to develop a fairly efficient process here – and one that could potentially leverage a podcaster’s existing recordings.
While AI voices aren’t new, the ability to make them sound like real people is a more modern development. A few years ago, Google stunned the world with a human-sounding AI in Duplex that could call restaurants to make reservations. But the technology was initially criticized for its lack of disclosure. This month, Apple introduced an accessibility feature, Personal Voice, that is able to mimic a user’s own voice after they first train the model by spending 15 minutes reading randomly chosen prompts, which are run locally on their device. incorporated.