Who Owns AI-Generated Content?
Art, copyright infringement, and redistribution of value with GAI
Part of the beauty of new innovations swerving into the mainstream is that users no longer have to be technical or consider what’s happening under the hood for it to work. For the sake of those tilling the digital soil to make apps like ChatGPT and DALL·E 2 possible, we should consider who owns this content and how value is distributed.
Our Covenant with Platforms
When we agree to use a service like Google, we implicitly enter an agreement for our usage data to be harvested to make the platform better. In exchange, we get a free service. These are, in fact, the Terms of Service. Since the rise of algorithmically-placed content, users have created this value for platforms they do not directly reap. Users are not paid for their role in making the algorithm better in the same way that artists are not compensated for works feeding generative AI algorithms.
A key difference is that artists, writers, and creators never enter this covenant with the corporations leveraging their work for training data. Training data is what allows powerful machine learning models to find and recreate patterns in images or text. Of course creators’ work is not directly copied, but is mushed together with a bunch of other data and used to train an algorithm that produces ‘new’ content. This makes ownership and copyright questions murky.
How Do GAI Companies Deal with Ownership?
Midjourney, one of the more popular text-to-image tools, cites users’ rights to generated content in its Terms of Service: “Subject to the above license, you own all Assets you create with the Services.” It goes on to describe that to use the content for commercial purposes (barring exceptions) you must purchase a commercial plan. Basically, if you want to make lots of money off of generated content, you have to pay them more money.
In its ToS, OpenAI delegates legal responsibility to its users, “You are responsible for Content, including for ensuring that it does not violate any applicable law or these Terms.”
Any reference to compensating the artists for their role in the training data is conspicuously absent.
US Copyright Law
For companies like OpenAI, valued at $29 billion dollars, the question of who owns the generated content is likely an important one that should evolve beyond delegating the question to users. A few schools of thought emerging here:
AI-generated work belongs in the public domain as human creative input is required for copyright
AI-generated work is derivative of the artists on which the algorithm was trained and thus, belongs to the artists
Some messy in-between
“Absent human creative input, a work is not entitled to copyright protection. As a result, the U.S. Copyright Office will not register a work that was created by an autonomous artificial intelligence tool.” States Esquenet, a copyright attorney.
Shades of gray emerge when we consider works like this:
This is a picture I made using DALL·E 2 then photoshopped to change the graphic on the screen and add an Apple logo. Does this have the ‘human authorship necessary to support a copyright claim’? The jury is still out on this. In the meantime artists are having their styles blatantly recreated by generative AI users who don’t pay for their work.
The data used to train DALL·E 2 and Midjourney includes copyrighted artists' works. Both companies put the burden on artists with copyrighted material to actively request for their material to be taken out of the training set if they believe it’s infringing on their rights. Oddly, neither provides a searchable database to determine whether one’s work is being used for training. Luckily, I found one.
I applaud innovation at companies like OpenAI, but hope to see improved attempts at distributing value to those who are instrumental to the training data. That means artists, data cleaners, and perhaps even bloggers. A simple step in the right direction would be to add transparency around where some of the images come from, and share resources with artists who can seek potential remuneration for their labor. Attribution for each image generated is a technical problem outside the scope of this piece; however, details on where the training data comes from and more accessible takedown provisions would be a step in the right direction.