Home Money Apple, Nvidia and Anthropic used thousands of YouTube videos to train AI

Apple, Nvidia and Anthropic used thousands of YouTube videos to train AI

0 comment
Apple, Nvidia and Anthropic used thousands of YouTube videos to train AI

In response to the lawsuits, the defendants as Goal, Open AIand Bloomberg have argued that their actions constitute fair use. A lawsuit has been filed against EleutherAI, which originally deleted the books and made them public, voluntarily. dismissed by the plaintiffs.

Litigation in the remaining cases is still in its early stages, so issues surrounding permission and payment have yet to be resolved. The Pile has since been removed from its official download site, but is still available on file-sharing services.

“Tech companies have acted in a reckless manner,” said Amy Keller, a consumer protection attorney and partner at the law firm DiCello Levitt, which has filed lawsuits on behalf of creatives whose work was allegedly harvested by artificial intelligence companies without their consent.

“People are concerned about the fact that they had no choice in this matter,” Keller said. “I think that’s what’s really problematic.”

Repeating a parrot

Many creators feel insecure about the path ahead.

Full-time YouTubers are on the lookout for unauthorized use of their work, sending out takedown notices regularly, and some worry that it’s only a matter of time before AI can generate content similar to what they make, or even produce direct imitations.

Pakman, the creator of The David Pakman ShowPakman saw the power of AI recently while browsing TikTok. He came across a video that was labeled as a Tucker Carlson clip, but when Pakman watched it, he was shocked. It sounded like Carlson, but it was, word for word, what Pakman had said on his YouTube show, down to the cadence. He was equally alarmed that only one of the video’s commenters seemed to acknowledge that it was fake: a voice clone of Carlson reading Pakman’s script.

“This is going to be a problem,” Pakman said in a Youtube video “You can do this with basically anyone,” he said of the fake.

Sid Black, co-founder of EleutherAI wrote On GitHub, Black said he created the YouTube captions using a script. That script downloads the captions from the YouTube API in the same way a YouTube viewer’s browser downloads them when watching a video. According to documentation on GitHub, Black used 495 search terms to select videos, including “funny vloggers,” “Einstein,” “black protestant,” “social protection services,” “infowars,” “quantum chromodynamics,” “Ben Shapiro,” “Uyghurs,” “fruitarian,” “cake recipe,” “Nazca lines,” and “flat earth.”

Although YouTube’s terms of service ban accessing their videos by “automated means,” more than 2000 GitHub users have favorited or backed up the code.

“There are many ways YouTube could prevent this module from working if that is what they are after,” machine learning engineer Jonas Depoix wrote in a discussion on GitHub, where he posted the code Black used to access YouTube subtitles. “This has not happened so far.”

In an email to Proof News, Depoix said he hasn’t used the code since he wrote it as a college student for a project several years ago and was surprised that people found it useful. He declined to answer questions about YouTube’s rules.

Google spokesman Jack Malon said in an emailed response to a request for comment that the company has taken “steps over the years to prevent abusive and unauthorized scraping.” He did not respond to questions about other companies using the material as training data.

Among the videos used by AI companies are 146 of Einstein’s Parrota channel with nearly 150,000 subscribers. The African grey parrot’s keeper, Marcia, who did not want to use her last name for fear of endangering the famous bird’s safety, said she initially found it amusing to learn that AI models had picked up on the words of a parrot mimic.

“Who would want to use a parrot’s voice?” Marcia said. “But I know he speaks very well. He speaks in my voice. So he imitates me and then the AI ​​imitates the parrot.”

Once the AI ​​ingests the data, it cannot be unlearned. Marcia was concerned about all the unknown ways her bird’s information could be used, including creating a digital duplicate of the parrot and, she worried, making it swear.

“We are entering uncharted territory,” Marcia said.

You may also like