September 3rd, 2024

OpenAI Pleads It Can't Make Money Without Using Copyrighted Materials for Free

OpenAI has requested permission from the British Parliament to use copyrighted materials for AI training, arguing it's essential for developing effective models, despite facing legal challenges and industry skepticism.

Read original articleLink Icon
OpenAI Pleads It Can't Make Money Without Using Copyrighted Materials for Free

OpenAI has requested permission from the British Parliament to use copyrighted materials for training its artificial intelligence models, arguing that it is essential for the development of advanced large language models (LLMs). In a submission to a House of Lords subcommittee, OpenAI stated that relying solely on public domain content would not suffice for creating effective AI systems. The company emphasized that copyright law currently encompasses a wide range of human expressions, making it nearly impossible to train AI without utilizing copyrighted works. OpenAI maintains that it complies with copyright laws and believes that training AI models does not violate these laws. However, this stance has faced significant opposition, including lawsuits from the New York Times and the Authors Guild, which argue that OpenAI's practices infringe on intellectual property rights and threaten the livelihoods of writers. OpenAI claims it is working to establish new partnerships with publishers to address these concerns, but skepticism remains regarding the acceptance of such arrangements by various stakeholders in the publishing industry.

- OpenAI argues it cannot train AI models without using copyrighted materials.

- The company claims compliance with copyright laws in its AI training practices.

- Legal challenges from the New York Times and the Authors Guild highlight opposition to OpenAI's methods.

- OpenAI is seeking partnerships with publishers to mitigate copyright issues.

- Concerns persist about the impact of AI on the livelihoods of writers and content creators.

Related

Microsoft says that it's okay to steal web content it because it's 'freeware.'

Microsoft says that it's okay to steal web content it because it's 'freeware.'

Microsoft's CEO of AI, Mustafa Suleyman, believes web content is "freeware" for AI training unless specified otherwise. This stance has sparked legal disputes and debates over copyright infringement and fair use in AI content creation.

Microsoft CEO of AI Your online content is 'freeware' fodder for training models

Microsoft CEO of AI Your online content is 'freeware' fodder for training models

Mustafa Suleyman, CEO of Microsoft AI, faced legal action for using online content as "freeware" to train neural networks. The debate raises concerns about copyright, AI training, and intellectual property rights.

OpenAI Wants New York Times to Show How Original Its Copyrighted Articles Are

OpenAI Wants New York Times to Show How Original Its Copyrighted Articles Are

OpenAI requests New York Times' materials for copyright assessment amid infringement claim. Times objects to broad approach, fearing chilling effect. Legal battle showcases AI-copyright tension.

OpenAI pleads it can't make money with o using copyrighted material for free

OpenAI pleads it can't make money with o using copyrighted material for free

OpenAI requests British Parliament to permit copyrighted material for AI training. Facing legal challenges from NYT and Authors Guild for alleged copyright infringement. Debate impacts AI development and copyright protection, raising concerns for content creators.

Has your paper been used to train an AI model? Almost certainly

Has your paper been used to train an AI model? Almost certainly

Academic publishers are selling research papers to AI firms, raising copyright concerns. Major deals include Taylor & Francis with Microsoft and Wiley with another company, prompting legal disputes and researcher frustrations.

Link Icon 44 comments
By @TaylorAlexander - 7 months
This is because they: expanded upon an existing sample-inefficient technology, commercialized the sample inefficient technology using copyrighted data, fundraised and expanded operations using this legally questionable technology, and are now complaining that they can’t balance their business expenses if they can’t keep using other people’s copyrighted works to feed their extremely sample inefficient data monster.

What they could have done was stayed as an open research org when the tech started to work, and focused on sample efficiency and cultivating copyright free data sets. But they were too impatient to commercialize.

Whoops.

I don’t actually think intellectual property restrictions are good, but I don’t want a world where small creators have their rights stomped on by multi billion dollar corporations. Either we have copyright or we don’t, but unless OpenAI is also going to give up their copyrights this seems deeply unfair.

By @disposition2 - 7 months
Looks like this is from Jan. 2024...and there doesn't seem to be an update on the article itself.

One might be able to find more information on the Committee's webpage (although, I'm not very familiar with the UK government...so this might not be accurate), https://committees.parliament.uk/work/7827/large-language-mo...

By @bmitc - 7 months
I can't stand corporations with excuseses like this. "But we have too much scale to fix that." "But we need this data to operate." "We can't be responsible at our scale." I say too bad for all of it. If you're not a viable business without bending rules and laws, then you're not a viable business.
By @TheCleric - 7 months
I'd like to frame this another way. OpenAI is saying they can't survive without copyright owners consenting to use their property to make OpenAI money. And honestly, if your business model is dependent on that, then that's your problem.

We can argue over whether you should need consent or not, but personally I find nothing wrong with someone being unable to use things I've created to make a buck without my permission (unless otherwise indicated by an explicit license).

By @gwbas1c - 7 months
IMO: I think this is a very strong case for copyright reform; and a very strong indicator that our public domain isn't healthy enough.
By @justsomeshmuck - 7 months
I think it is important for America and Europe to take the side of using copyrighted works for LLM is fair use/not illegal. Advancement in this space in the west will be hindered otherwise, and nations that don’t respect IP law will have enormous advantage.
By @seizethecheese - 7 months
The alarmism in this thread is misguided. By advocating for excluding copyrighted works from LLM training, you are advocating an expansion of copyright protection. This attitude is opposite my understanding pf the hacker ethos.
By @elliottkember - 7 months
A very misleading title, that's not what they're saying at all. They're saying that training does not constitute a breach of copyright. "legally copyright law does not forbid training."
By @blacksmith_tb - 7 months
I wouldn't want to come to their defense, but the argument reminds me a little of earlier fights about if search engines owe the sites they crawl anything. Which led to things like Canada's Online News Act[1] which doesn't seem to me to have been very good for users in Canada (but I am not in Canada, maybe it has upsides?)

From my perspective, before OpenAI used/stole these copyrighted works, the public had to pay the original creators to get access to them, and now they've been absorbed into ChatGPT and friends, we have to pay someone else... seems like a wash for end users?

1: https://www.nbcnews.com/tech/tech-news/google-canada-law-onl...

By @dcwca - 7 months
It is totally legal to train on this stuff, but illegal to reproduce copyrighted works. Interestingly, Google's business model could have been criticized the same way. They construct a big index of copyrighted works, reproduce them, and monetize it.
By @Workaccount2 - 7 months
The big question is whether or not a judge(s) will consider a vector space many orders of magnitude smaller than it's training set, and not really containing anything that resembles legible data, to be an archive of copyrighted works.

To me it makes way more sense to just censor outputs. I can draw Batman from memory, but I wouldn't go out an start selling batman drawings. I can easily self censor.

The solution for transformers is plainly obvious, but I can understand the fear of training something that might well displace you.

By @tim333 - 7 months
>OpenAI Pleads It Can't Make Money Without Using Copyrighted Materials for Free

is not of course true. It said if it can't use the materials then its product would be bad and they'd lose out to Chinese competitors how did not have the restrictions.

Not quite sure what the answer is but I spent a fair bit of time today trying to get access to some paper not produced by Elsevier but for which they have managed to gate the worlds access to to make a few bob. There's a lot to be said for information being free.

By @robryan - 7 months
It is interesting that they both licence content and say it is fair use. Seems like those who complain the loudest will get something for their content and everyone else will get nothing.
By @ChrisArchitect - 7 months
Not a new story.

Some discussion in January:

https://news.ycombinator.com/item?id=38912259

By @burnte - 7 months
My response is: Ok. Not my problem. OpenAI isn't entitled to free profit. No one is.
By @skeledrew - 7 months
It makes sense, at a base level. Where would the cost of training (and thus the cost of accessing the service, if it got to that point without going bankrupt) be if all copyrighted works had to be paid for? What would the model quality be (and thus the model be worth) if only public domain content could be utilized? The only reasons this is an issue are because they got to the making money point (and ppl don't like seeing money being made if they can't get in in the action), and there are content creators afraid that this thing will make them obsolete (which will be the case for many). Typical clash of interests.
By @GiorgioG - 7 months
Let's face it current AI/LLMs are nice tinker-toys, but they are the tiny building blocks that the real power will come from some day - when we can run tens of thousands of these models to solve real problems and not...AutoComplete++. The hardware will need to be exponentially more powerful, efficient and cheap before the real AI-powered future we've been hyped/promised can be realized.
By @jmclnx - 7 months
>OpenAI is begging the British Parliament to allow it to use copyrighted works because it's supposedly "impossible" for the company to train its artificial intelligence models

If it was up to me, I would allow OpenAI access only if the license every single line of source under the GPLv3 (yes v3).

Under any other license, "tough to be you".

I expect OpenAI to go proprietary once they hit a certain level of market strength.

By @OutOfHere - 7 months
AI shows the many flaws in various poorly thought out IP and information related laws which should never have existed in the first place.
By @jasonlfunk - 7 months
Can someone help me understand why it's a problem for companies to train these huge LLM on your copyrighted material? What exactly is the harm that is being done to the copyright holder?

I can understand why the New York Times (for example) wants to claim that a couple billion dollar companies have done it actual harm; but I am struggling to actually identify what it is.

By @leobg - 7 months
One more reason why this is B.S.:

They can obviously license that content.

No rights to publish. Just the right to use the content as part of their training data for the AI.

By @NemoNobody - 7 months
I think the best argument I've ever seen that AI ought to be a public good.
By @ChuckMcM - 7 months
Okay, that is just fucking hilarious. (sorry for the profanity).
By @quantum_state - 7 months
would sound very funny if it is generalized …
By @babyshake - 7 months
"I drink your milkshake" is the best four word summary of the situation I can imagine.
By @exe34 - 7 months
awh diddums. I also have to remain poor if I obey the law.
By @fungiblecog - 7 months
So big corporations love copyright laws when they can use them to make enormous profits - but then want exemptions when the exact same laws don’t allow them to make enormous profits. Welcome to the world we get when we let rich arseholes make all the rules.
By @biglyburrito - 7 months
Oh no. Anyway…
By @findthewords - 7 months
Let us put it bluntly, the AI bubble is built on piracy.
By @tivert - 7 months
In other news: burglar says he can't make money from his burglary skills without stealing, and pleads that the laws against theft be repealed.
By @dkersten - 7 months
I would also be able to make money if I could use copyrighted material for free, but that doesn’t make it ok for me to do.

If they can’t exist without doing so, then maybe they shouldn’t exist. They don’t have any inherent right to making money.

By @_heimdall - 7 months
Isn't this the exact same business model that Eric Schmidt bragged about at Stanford?

1) steal IP and build a thing

2a) if it fails, rinse and repeat with a "new" step 1

2b) if it succeeds, hire a flees of lawyers to clean up the mess

3) get rich

By @OutOfHere - 7 months
There is no violation because it can be argued that the AI is a sentient entity, and sentient entities have a right to read and remember texts borrowed from the library.