Think of any vaguely parenting-related topic you can think of and there’s probably a post about it on Mumsnet, the long-running, hugely popular, and controversial UK-based parenting forum for mothers. Over its two-decade-plus history, Mumsnet has amassed an archive of more than six billion words written by its highly engaged user base, on topics from dirty diapers to lazy husbands. (Not to mention a Crazy speech about dolphins.)
This spring, after Mumsnet discovered that AI companies were harvesting its data, the company says it decided to try to strike licensing deals with some of the biggest players in the sector, including OpenAI, which initially expressed a willingness to explore a deal after Mumsnet first reached out. After talks with OpenAI failed, Mumsnet announced in July its intention to take legal action.
According to Mumsnet, during those early discussions, an OpenAI strategic partnership official told the company that datasets of more than a billion words were of interest to the AI giant. Mumsnet executives were enthusiastic. “We spent quite a bit of time back and forth with them,” Mumsnet founder and CEO Justine Roberts tells WIRED. “We had to sign some NDAs and they wanted us to give them a lot of information.”
However, more than a month later, OpenAI told Mumsnet that the company was no longer interested in partnering at that point, according to an email exchange reviewed by WIRED. When asked why, the OpenAI staffer called Mumsnet’s 6 billion-word dataset too small to justify a licensing deal, Roberts says. They also noted that OpenAI is primarily interested in large datasets that aren’t accessible to the public online, and that it wanted datasets that captured broad human experience.
The company echoed this sentiment when asked for comment by WIRED. “We seek partnerships for large-scale data sets that reflect human society and do not seek partnerships solely for publicly available information,” says OpenAI spokesperson Kayla Wood. “We support choice for publishers and creators, offering them ways to express their preferences for how their sites and content are powered by AI in search results and by training basic generative AI models.”
Roberts says she was “irritated” by this development. She recalls that, at first, OpenAI seemed particularly interested in Mumsnet because of the platform’s content, which is predominantly written by women. “It’s very high-quality conversational data,” she says. “It’s 90 percent female conversations, which is quite unusual.”
OpenAI has closed a variety of data licensing agreements with media outlets and platforms over the past year, entering into agreements with Vox media, he AtlanticAxel Springer, Timeand WIRED’s parent company, Condé Nast, as well as platforms filled with user-generated content like Reddit. (Automattic, the owner of WordPress.com and Tumblr, was also said to be in licensing talks earlier this year.) Since the details of those deals haven’t been disclosed, it’s unclear how big their respective corpuses are.
When WIRED asked about the size of the data sets it will consider for commercial licensing, OpenAI declined to share that information. But spokesperson Kayla Wood emphasizes that the company’s partnerships with publishers are “focused on getting their content featured in our products and driving traffic to them.”