Home Tech Accessible and ‘a pleasure to read’: How Apple podcast transcripts came to be

Accessible and ‘a pleasure to read’: How Apple podcast transcripts came to be

0 comments
Accessible and 'a pleasure to read': How Apple podcast transcripts came to be

Ren Shelburne was tired of trying to listen to episodes of popular podcasts that her friends recommended. Shelburne, a photographer with partial hearing loss and an auditory processing condition, remembers having difficulty finishing a particular episode. It was a specific kind of show: too many talking heads, complicated overlapping dialogue, and, until recently, no transcripts. “The ones where I’m so lost because there’s too much going on at once,” Shelburne says. She couldn’t follow it, so she couldn’t talk about the program with her friends. “Podcasts are a huge part of pop culture and media right now. “I want to be able to be part of that conversation.”

The weekly podcast audience in the United States has more than quadrupled in the last decade, according to Bank investigation. For some, however, the medium still seems inaccessible.

“Sometimes I miss something because of my hearing loss,” says Alexandra Wong, a Rhodes scholar who studies digital accessibility, “and I have to go back and rewind it about five or six times to make sure I can understand what’s happening.” . .”

Shelburne and Wong are among the approximately 15% of adults in the US, about 37.5 million people, who report difficulty unaided hearing, many of whom rely on subtitles and transcriptions to follow music, movies and podcasts. Video streaming companies like Netflix, Peacock, and Hulu offer subtitles for almost all of their programming, and time-synchronized lyric subtitles have become increasingly standard in music streaming. The prevalence of video captioning has been embraced by audiences beyond the disability community; 80% of Netflix viewers turn on subtitles at least once a month.

In contrast, podcasting companies have arrived late to the accessibility game. Sirius XM and Gimlet have faced lawsuits claiming they violate the Americans with Disabilities Act for not providing transcripts. Spotify, which owns Gimlet, launched podcast transcriptions in September, but the feature is only available for shows owned by the streaming service and podcasts hosted there.

Apple announced in March that automatically generated transcripts would be available for any new podcast episodes played in its app across iPads and iPhones on Apple’s latest operating system.

“Our goal is obviously to make podcasts more accessible and more immersive,” says Ben Cave, Apple’s global director of podcasts.

Sarah Herrlinger, who manages Apple’s accessibility policy, says developing the transcription tool involved working with both disabled Apple employees and outside organizations. Transcription became a priority for Apple Podcasts due to growing demand from both disabled users and podcast creators, she said.

“This is one of the most requested features by creators,” says Cave. “They’ve been asking for this.”

Apple’s journey into podcast transcriptions began with the expansion of a different feature: indexing. It’s a common origin story at several tech companies like Amazon and Yahoo: What starts as a search tool evolves into a full-fledged transcription initiative. Apple first implemented software that could identify specific words in a podcast in 2018.

“What we did then was offer a single line of the transcript to give users context about a result when they search for something in particular,” Cave recalls. “There are a few different things we did in the intervening seven years, which came together in this feature (transcript).”

Cave says one of the big hurdles in the years of development was ensuring a high level of performance, display and accuracy. A number of big advances came from accessibility innovation in other departments at Apple.

“In this case, we took the best of what we learned from reading on Apple Books and lyrics on Apple Music,” Herrlinger says. Apple Podcast transcripts borrow the time-synchronized verbatim highlighting of Apple Music and use the Apple Books font and contrasting color scheme for the visually impaired.

Apple has tried to make up for its delay in launching a transcription feature by offering a more comprehensive one than its competitors. Amazon Music has offered automatically generated transcripts for podcasts from 2021, but they are available only for their original programs and some other popular programs, with subtitles appearing as block text rather than highlighted verbatim. Spotify launched a transcription feature in September 2023 that includes word-by-word highlighting, but it’s only available for Spotify Originals and shows hosted on its platform.

In the war with Apple over music and podcast streaming, Spotify failed to launch an AI-powered translation tool last fall, which was intended to offer multiple podcasts in French, German and Spanish. The company appears to have failed to deliver on specific promises made at the time. Currently, the streaming service seems to only have Spanish translations – and predominantly for a single podcast: The Lex Fridman Podcast. In its advertisementSpotify named several podcasts as part of its translation program, such as The Rewatchables and What Now? by Trevor Noah. – which have no transcripts available in languages ​​other than English at the time of publication, nine months later. Spotify declined to comment when asked about these discrepancies.

Apple’s podcast app will transcribe each new episode uploaded. “We wanted to do it for all the shows, so it’s not just a small portion of the catalog,” Cave says. “It wouldn’t be appropriate for us to put an arbitrary limit on the number of programs that get it… We think it’s important from an accessibility standpoint because we want to give people the expectation that transcripts are available for everything.” who want to interact. with.”

Cave adds, “Over time, the entire library of episodes will be transcribed.” However, he says Apple is prioritizing transcripts of new content and declines to say when transcripts of back catalogs might arrive.

Disability activists and users said they believed Apple’s story that the company was working to get it right rather than releasing a bad product, even though it was lagging behind its competitors. They said they would rather wait for the right product than rush into a bad accessibility project.

“I respect that. Having captions and transcripts that are inaccurate just defeats the purpose,” Shelburne said.

skip past newsletter promotion

“I was amazed at how accurate it was,” says Larry Goldberg, a pioneer in media accessibility and technology who created the first closed-captioning system for movie theaters. The fidelity of self-transcription is something that has been missing for a long time, he adds. “It’s gotten better, it’s gotten better… but there are times when it’s so mistaken.”

Goldberg and other experts called YouTube’s auto-generated captions tool, available since 2009, a mediocre and rushed product. While YouTube says its transcriptions received a notable improvement in 2023, the tool has come under frequent criticism over the years for its lack of accuracy. This is what some critics have called “shit” – when the tool confuses words like “corrector” for “zebra” and “wedding” for “lady”.

Goldberg remembers being surprised by a colleague’s reaction while discussing the consequences of misinformation from an unreliable transcript: It’s better than nothing.

“Is that your quality standard? Better than nothing?” Goldberg said. “No, that’s just not good enough.”

In addition to making sure the transcripts capture speech as accurately as possible, Apple’s Cave says a lot of work went into training the software to also exclude things.

“We also want to make reading a pleasure… That means we wanted to reduce the relative importance of things like filler words, like ‘ums’ and ‘ahs.'” For podcast creators who want those vocal disfluencies transcribed, Apple says they will have to upload a custom transcript.

The folks at Apple are already noticing that the transcription tool is used for a variety of surprising purposes.

“We’re seeing a lot of users interact with transcripts in the language learning space,” Cave says, noting that Apple Podcast Transcripts support podcasts in English, Spanish, French, and German.

“We often find that by building for the margins, we make a better product for the masses,” Herrlinger says. “Other communities will find those features and find ways to use them in some cases where we know they could benefit someone else.”

Goldberg, the accessibility expert, wants other platforms to adopt Apple’s approach to transcription. His dream is for more companies to start treating podcast transcriptions with the same priority given to video content.

“I used to refer to my job as a begging boss. ‘Please, please Put subtitles on your video please!’ Not anymore. Oh no. Everyone is doing it,” he said, referring to his work founding the National Center for Accessible Media at radio station WGBH in Boston. He says the norm now is that “you just don’t upload videos to the Internet without subtitles.” He hopes podcasts will follow his lead.

Wong, the Rhodes Scholar, also praises Apple Podcasts’ transcription feature, but also sees some areas for improvement. She is somewhat cautious about the tool’s recognition of complex and unique terms.

“Since it is automatically generated, errors can occur with very misspelled names in the transcripts and really difficult scientific terms,” Wong says. Apple acknowledges that better name recognition is already on its radar and that it plans to expand to more languages.

You may also like