Lawsuits are never exactly a love fest, but the copyright fight between The New York Times and OpenAI and Microsoft is becoming especially contentious. This week, the Times alleged that OpenAI engineers inadvertently deleted data that the newspaper’s team spent more than 150 hours extracting as potential evidence.
OpenAI was able to recover much of the data, but the Times’ legal team says the original file names and folder structure are still missing. According to a statement Filed in court on Wednesday by Jennifer B. Maisel, an attorney for the newspaper, this means the information “cannot be used to determine where the plaintiffs’ copied articles” may have been incorporated into OpenAI’s artificial intelligence models.
“We do not agree with the characterizations made and will present our response soon,” OpenAI spokesperson Jason Deutrom told WIRED in a statement. The New York Times declined to comment.
The Times filed its copyright lawsuit against OpenAI and Microsoft last year, alleging that the companies had illegally used its articles to train artificial intelligence tools like ChatGPT. The case is one of many ongoing legal battles between artificial intelligence companies and publishers, including a similar lawsuit filed by the Daily News that is being handled by some of the same lawyers.
The Times case is currently in discovery, meaning both sides are turning over requested documents and information that could become evidence. As part of the process, the court required OpenAI to show the Times its training data, which is a big problem: OpenAI has never publicly revealed exactly what data was used to build its AI models. To reveal it, OpenAI created what the court calls a “sandbox” of two “virtual machines” that Times lawyers could examine. In his statement, Maisel claimed that OpenAI engineers had “wiped” the data organized by the Times team on one of these machines.
According to Maisel’s filing, OpenAI acknowledged that the information had been removed and attempted to fix the issue shortly after it was alerted earlier this month. But when the newspaper’s lawyers examined the “restored” data, it was too disorganized, forcing them to “recreate their work from scratch using significant man hours and computer processing time,” several other Times lawyers said in a letter presented before the judge on the same day as Maisel’s statement.
The lawyers noted that they had “no reason to believe” that the deletion was “intentional.” In emails submitted as evidence along with Maisel’s letter, OpenAI lawyer Tom Gorman referred to data deletion as a “bug”.