AI companies have all kinds of arguments against paying for copyrighted content

The biggest companies in AI aren’t interested in paying to use copyrighted material as training data, and here are their reasons why.

By Wes Davis, a weekend editor who covers the latest in tech and entertainment. He has written news, reviews, and more as a tech journalist since 2020.

Nov 4, 2023, 10:17 PM UTC

An image showing a graphic of a brain on a black background

The US Copyright Office is taking public comment on potential new rules around generative AI’s use of copyrighted materials, and the biggest AI companies in the world had plenty to say. We’ve collected the arguments from Meta, Google, Microsoft, Adobe, Hugging Face, StabilityAI, and Anthropic below, as well as a response from Apple that focused on copyrighting AI-written code.

There are some differences in their approaches, but the overall message for most is the same: They don’t think they should have to pay to train AI models on copyrighted work.

The Copyright Office opened the comment period on August 30th, with an October 18th due date for written comments regarding changes it was considering around the use of copyrighted data for AI model training, whether AI-generated material can be copyrighted without human involvement, and AI copyright liability. There’s been no shortage of copyright lawsuits in the last year, with artists, authors, developers, and companies alike alleging violations in different cases.

Here are some snippets from each company’s response.

Meta: Copyright holders wouldn’t get much money anyway

Imposing a first-of-its-kind licensing regime now, well after the fact, will cause chaos as developers seek to identify millions and millions of rightsholders, for very little benefit, given that any fair royalty due would be incredibly small in light of the insignificance of any one work among an Al training set.

Google: AI training is just like reading a book

If training could be accomplished without the creation of copies, there would be no copyright questions here. Indeed that act of “knowledge harvesting.” to use the Court’s metaphor from Harper & Row, like the act of reading a book ‘and learning the facts and ideas within it, would not only be non-infringing, it would further the very purpose of copyright law. The mere fact that, as a technological matter, copies need to be made to extract those ideas and facts from copyrighted works should not alter that result.

Microsoft: Changing copyright law could hurt small AI developers

Any requirement to obtain consent for accessible works to be used for training would chill Al innovation. It is not feasible to achieve the scale of data necessary to develop responsible Al models even when the identity of a work and its owner is known. Such licensing schemes will also impede innovation from start-ups and entrants who don’t have the resources to obtain licenses, leaving Al development to a small set of companies with the resources to run large-scale licensing programs or to developers in countries that have decided that use of copyrighted works to train Al models is not infringement.

Anthropic: Current law is fine; don’t change it

Sound policy has always recognized the need for appropriate limits to copyright in order to support creativity, innovation, and other values, and we believe that existing law and continued collaboration among all stakeholders can harmonize the diverse interests at stake, unlocking AI’s benefits while addressing concern.

Adobe: It’s fair use, like when Accolade copied Sega’s code

In Sega v. Accolade, the Ninth Circuit held that intermediate copying of Sega’s software was fair use. The defendant made copies while reverse engineering to discover the functional requirements—unprotected information—for making games compatible with Sega’s gaming console. Such intermediate copying also benefited the public: it led to an increase in the number of independently designed video games (which contain a mix of functional and creative aspects) available for Sega’s console. This growth in creative expression was precisely what the Copyright Act was intended to promote.

Anthropic: Copying is just an intermediate step

For Claude, as discussed above, the training process makes copies of information for the purposes of performing a statistical analysis of the data. The copying is merely an intermediate step, extracting unprotectable elements about the entire corpus of works, in order to create new outputs. In this way, the use of the original copyrighted work is non-expressive; that is, it is not re-using the copyrighted expression to communicate it to users.

Andreessen Horowitz: Investors have spent ‘billions and billions’

Over the last decade or more, there has been an enormous amount of investment—billons and billions of dollars—in the development of AI technologies, premised on an understanding that, under current copyright law, any copying necessary to extract statistical facts is permitted. A change in this regime will significantly disrupt settled expectations in this area. Those expectations have been a critical factor in the enormous investment of private capital into U.S.-based AI companies which, in turn, has made the U.S. a global leader in AI. Undermining those expectations will jeopardize future investment, along with U.S. economic competitiveness and national security.

Hugging Face: Training on copyrighted material is fair use

The use of a given work in training is of a broadly beneficial purpose: the creation of a distinctive and productive Al model. Rather than replacing the specific communicative expression of the initial work, the model is capable of creating a wide variety of different sort of outputs wholly unrelated to that underlying, copyrightable expression. For those and other reasons, generative Al models are generally fair use when they train on large numbers of copyrighted works. We use “generally” deliberately, however, as one can imagine patterns of facts that would raise tougher calls.

StabilityAI: Other countries call AI model training fair use

A range of jurisdictions including Singapore, Japan, the European Union, the Republic of Korea, Taiwan, Malaysia, and Israel have reformed their copyright laws to create safe harbors for Al training that achieve similar effects o fair use.” In the United Kingdom, the Government Chief Scientific Advisor has recommended that “if the government’s aim is to promote an innovative Al industry in the UK, it should enable mining of available data, text, and images (the input) and utilise [sic] existing protections of copyright and IP law on the output of AI.

Apple: Let us copyright our AI-made code

In circumstances where a human developer controls the expressive elements of output and the decisions to modify, add to, enhance, or even reject suggested code, the final code that results from the developer’s interactions with the tools will have sufficient human authorship to be copyrightable.

AI companies have all kinds of arguments against paying for copyrighted content

AI companies have all kinds of arguments against paying for copyrighted content

The biggest companies in AI aren’t interested in paying to use copyrighted material as training data, and here are their reasons why.

Meta: Copyright holders wouldn’t get much money anyway

Google: AI training is just like reading a book

Microsoft: Changing copyright law could hurt small AI developers

Anthropic: Current law is fine; don’t change it

Adobe: It’s fair use, like when Accolade copied Sega’s code

Anthropic: Copying is just an intermediate step

Andreessen Horowitz: Investors have spent ‘billions and billions’

Hugging Face: Training on copyrighted material is fair use

StabilityAI: Other countries call AI model training fair use

Apple: Let us copyright our AI-made code

I traded in my MacBook and now I’m a desktop convert

The walls of Apple’s garden are tumbling down

How to delete the data Google has on you

BlizzCon 2024 has been canceled

The Apple Vision Pro’s eBay prices are making me sad

More from this stream From ChatGPT to Google Bard: how AI is rewriting the internet

At least in Canada, companies are responsible when their customer service chatbots lie to their customer.

Scientists are extremely concerned about this rat's “dck.”

Sora’s AI-generated video looks cool, but it’s still bad with hands.

You sound like a bot

AI companies have all kinds of arguments against paying for copyrighted content

AI companies have all kinds of arguments against paying for copyrighted content

The biggest companies in AI aren’t interested in paying to use copyrighted material as training data, and here are their reasons why.

Share this story

Related

I traded in my MacBook and now I’m a desktop convert

The walls of Apple’s garden are tumbling down

How to delete the data Google has on you

BlizzCon 2024 has been canceled

The Apple Vision Pro’s eBay prices are making me sad

More from this stream From ChatGPT to Google Bard: how AI is rewriting the internet