Meta CEO Mark Zuckerberg says there are complex copyright issues surrounding scraping data to train AI models, but says the individual work of most developers is not valuable enough to matter. In an interview with The edge According to Zuckerberg's deputy editor-in-chief Alex Heath, Meta will likely enter into “certain partnerships” for useful content, but if others demand payment, then – as is the case with news outlets – the company would rather opt out.
“I think individual authors or publishers tend to overestimate the value of their specific content in the bigger picture,” Zuckerberg said in the interview, which coincides with Meta's annual Connect event. “I appreciate that there will be certain partnerships that will be formed when content is really important and valuable.” But if authors are concerned or object, “when push comes to shove and they request that we not use their content, then we're just not going to use their content. It's not like that's going to change the outcome of this thing much.”
Meta, like nearly every major AI company, is currently embroiled in a legal battle over the limits of unauthorized scraping of data for AI training. Last year, the company was sued by a group of authors, including Sarah Silverman, who claimed her Llama model was unlawfully trained using pirated copies of her work. (The case is currently not going well for those authors; last week, a judge reprimanded their legal team for being “either unwilling or unable to follow due process.”)
The company argues – like almost every major AI player – that this kind of unauthorized scraping should be allowed under U.S. fair use law. Zuckerberg elaborates on the issue:
I think that in any new medium of technology, there are concepts around fair use and where the line is of what you have control over. When you put something out into the world, to what extent do you still have control over it, ownership over it, and license over it? I think all of these things fundamentally need to be renegotiated and discussed in the AI age.
The history of copyright is actually a history of deciding what control people have over their own published works. Fair use is designed to allow people to modify and extend the works of others without permission or compensation, and that is very often a good thing. However, some AI developers have interpreted it much more broadly than most courts. Microsoft's AI CEO, for example, said earlier this year that everything “on the open web” is “freeware” and “anyone can copy it, recreate it, and reproduce it.” (That is categorically wrong, legally speaking: Content published publicly online is no less protected by copyright than any other medium, and to the extent that you can copy or modify it under fair use, you can also copy or modify a book, movie, or paid article.)
Meanwhile, some artists have resorted to unofficial tools that would prevent their work from being used for AI training. But especially with anything posted on social media before the advent of generative AI, they are sometimes hampered by the terms of service that allow these companies to train on their work. Meta has stated that it trains its AI tools on public Instagram and Facebook posts.
Zuckerberg said Meta's future AI content strategy will likely reflect its outspoken response to proposed legislation that would require a fee for links to news posts. The company has typically responded to those rules by blocking news outlets in countries like Australia and Canada. “Look, we're a big company,” he said. “We'll pay for content if it's valuable to people. We're just not going to pay for content if it's not valuable to people. I think you're probably going to see a similar dynamic with AI.”
We've known for a while that news isn't particularly valuable to Meta, in part because moderating it generates controversy and (according to Meta) makes users feel bad. (“If we were actually just following the wishes of our community, we'd show even less than we do,” Zuckerberg said in the interview.) The company's generative AI products are still in their infancy, and it's not clear whether anyone has figured out what people want from these tools. But whatever it is, most developers probably shouldn't expect to get paid for it.