August 28th, 2024

Judge dismisses majority of GitHub Copilot copyright claims

A judge dismissed most copyright claims against GitHub, Microsoft, and OpenAI regarding GitHub Copilot, leaving two active claims while rejecting DMCA violations and preventing refiling of dismissed claims.

Read original articleLink Icon
Judge dismisses majority of GitHub Copilot copyright claims

A judge has dismissed most of the copyright claims in a lawsuit against GitHub, Microsoft, and OpenAI regarding the AI-powered coding assistant, GitHub Copilot. The lawsuit, initiated by developers in 2022, originally included 22 claims alleging copyright violations. Judge Jon Tigar's ruling, which was unsealed recently, leaves only two claims active: one concerning an open-source license violation and another related to breach of contract. The court dismissed the primary allegation that GitHub Copilot violated the Digital Millennium Copyright Act (DMCA) by suggesting code without proper attribution. The judge found the developers' arguments unconvincing, stating that the code in question was not sufficiently similar to the original works and noted that GitHub Copilot rarely reproduces memorized code. Consequently, the judge dismissed the DMCA claim with prejudice, preventing the developers from refiling it. Additionally, requests for punitive damages and monetary relief were also dismissed. Despite this ruling, the legal battle continues with the remaining claims likely to proceed through litigation, highlighting the ongoing legal complexities surrounding AI coding assistants and their training on existing codebases.

- Most copyright claims against GitHub Copilot have been dismissed.

- Only two claims remain: one for open-source license violation and one for breach of contract.

- The judge found the arguments regarding DMCA violations unconvincing.

- The ruling prevents the developers from refiling the dismissed claims.

- The case underscores the legal challenges faced by AI-powered coding tools.

Link Icon 25 comments
By @KyleBerezin - 8 months
I will throw in a random story here about chat gpt 4.0. I'm not commenting on this article directly, just a somewhat related anecdote. I was using chatgpt to help me write some android opengl rendering code. OpenGL can be very esoteric and I haven't touched it for at least 10 years.

Everything was going great and I had a working example, so I decided to look online for some example code to verify I was doing things correctly, and not making any glaring mistakes. It was then that I found an exact line by line copy of what chat gpt had given me. This was before it had the ability to google things, and the code predated openAI. It had even brought across spelling errors in the variables, the only thing it changed was it translated the comments from Spanish to English.

I had always been under the impression that chat gpt just learned from sources, and then gave you a new result based roughly on its sources. I think some of the confounding variables here were, 1. this was a very specific use case and not many examples existed, and 2. all opengl code looks similar, to a point.

The worst part was, there was no license provided for the code or the repo, so it was not legal for me to take the code wholesale like that. I am now much more cautious about asking chat gpt for code, I only have it give me direction now, and no longer use 'sample code' that it produces.

By @nl - 8 months
The original reporting has more details: https://www.developer-tech.com/news/judge-dismisses-majority...

In particular this:

An amended version of the complaint had taken issue with GitHub’s duplication detection filter, which allows users to “detect and suppress” Copilot suggestions matching public code on GitHub.

The developers argued that turning off this filter would “receive identical code” and cited a study showing how AI models can “memorise” and reproduce parts of their training data, potentially including copyrighted code.

However, Judge Tigar found these arguments unconvincing. He determined that the code allegedly copied by GitHub was not sufficiently similar to the developers’ original work. The judge also noted that the cited study itself mentions that GitHub Copilot “rarely emits memorised code in benign situations.”

I think this is the key point: reproduction is the issue, not training. And as noted in the study[1] reproduction doesn't usually happen unless you go to extra lengths to make it.

[1] Not sure but maybe https://dl.acm.org/doi/abs/10.1145/3597503.3639133? Can anyone find the filing?

By @darby_nine - 8 months
Huh I guess you can just avoid legal liability by laundering through a chatbot
By @austin-cheney - 8 months
The comments seem to misunderstand copyright. Copyright protects a literal work product from unauthorized duplication and nothing else. Even then there are numerous exceptions like fair use and personal backups.

Copyright does not restrict reading a book or watching a movie. Copyright also does not restrict access to a work. It only restricts duplication without express authorization. As for computer data the restricted duplication typically refers to dedicated storage, such as storage on disk as opposed to storage in CPU cache.

When Viacom sued YouTube for $1.6 billion they were trying to halt the public from accessing their content on YouTube. They only sued YouTube, not YouTube users, and only because YouTube stored Viacom IP without permission.

By @maronato - 8 months
The judge argues that copilot “rarely emits memorised code in benign situations”, but what happens when it does? It is bound to happen some day, and when it does would I be breaching copyright by publishing the code copilot wrote? Just a few weeks ago a very similar suit for stable diffusion had its motion to dismiss copyright infringement claims denied. https://arstechnica.com/tech-policy/2024/08/artists-claim-bi...
By @ChrisArchitect - 8 months
Misleading OP,

Discussion from July:

Judge dismisses DMCA copyright claim in GitHub Copilot suit

https://news.ycombinator.com/item?id=40919253

By @AnimalMuppet - 8 months
Interesting. The parts that survived are the contract claims and the open-source license claims.

Contract is understandable - it supersedes almost everything else. If the law says I can do X but the contract says I can't, then I almost certainly can't.

It's nice to see open-source licenses being treated as having somewhat similar solidness as a contract.

By @panic - 8 months
If you have access to the Copilot weights, you should consider leaking them. We shared our code with you because we wanted it to be free, not sold back to us at $10/month.
By @jsyang00 - 8 months
> leaves only two claims standing: one accusing the companies of an open-source license violation and another alleging breach of contract

these seem to be major claims?

By @23B1 - 8 months
https://sfconservancy.org/GiveUpGitHub/

I was lucky to learn early-on that publishing important things to the web meant relinquishing control of not just the IP, but my own agency and fate. The cost far exceeded the benefits of generosity, be it contributions to FOSS, public blogging or documentation, or even just writing.

Time is the only fixed resource, and mine is proprietary, exclusive, and for sale to the highest bidder.

By @nadermx - 8 months
The purpose of Copyright is to promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.

"Sciences" refers not only to fields of modern scientific inquiry but rather to all knowledge

The hacker ethic is a philosophy and set of moral values within hacker culture. Practitioners believe that sharing information and data with others is an ethical imperative

hrmmm...

By @seanw444 - 8 months
Where do we draw the line between AI learning from codebases to offer code solutions, and humans learning from codebases to offer code solutions?
By @btown - 8 months
(July 10, 2024)
By @tidenly - 8 months
A lot of people dislike LLMs and generative AI (fairly) and are reflexively trying to reach for tools in our legal framework, claiming it's obviously already illegal. I don't think this is going to work. Generative AI is quite obviously novel to anyone who isn't in denial - and claiming existing copyright laws are going to cover it seems like a lost cause.

We need new laws. Especially regarding deepfakes, it's shocking how many people think revenge porn laws and such are going to be enough here. Rather than just focusing on the data usage, we need more fundamental laws and rights, like the right to control representations of ourselves, like Japan has, where producing images or voice/video in your likeness is prosecutable straight out. Likewise we need laws that explicitly target data use for training that is separate to copyright.

The way LLMs are trained is obviously too similar to how humans learn, and the transformation and then output produce works that are novel based on that "learning", just like humans do. This is so fundamentally different to what copyright laws were made to cover, I find it infuriating how many people handwave these arguments away. Only in perfect 1-to-1 regurgitation does it even feel close to something copyright would be able to cover.

By @PaulKeeble - 8 months
The consequence of all the abuse of the intent of open source licenses has just resulted in me not writing any open source code. I have a lot less issues with a code generator trained on GPl code that produces GPL code with the LLM being under GPL as well. Its the commercial licensing and paying for it that seems to breach the intent of these licenses to me.

I guess Microsoft has gotten what it wanted and has got to the extinguish stage of its plan for open source finally and all it needed was a chatbot.

By @Palmik - 8 months
Curious to see if the same will apply to other materials like news, books, images, music, movies, etc.
By @robswc - 8 months
I honestly just don't see how all this will work legally, in the future.

I don't know anything an LLM (or "AI") can do that a human couldn't, with enough time. If it can get a human in trouble, it should get the operators of the AI in trouble too. Likewise, if a human can do it, I don't see why an AI is any different.

By @slowhadoken - 8 months
I’ve heard corporate types call open source projects “security risks” and “commie nonsense” but it does stop them from trying to acquire the work for free to profit off of it. It’s greedy and duplicitous. It’s capture.
By @beeboobaa3 - 8 months
Guess microsoft paid them off
By @InDubioProRubio - 8 months
Finally, the great IP washing machine hums and can dissolve the whole structure. Bring forth your disassembly, to generate a draft, to re-generate clean source code. Cooperate-communism! It is done!
By @coding123 - 8 months
Its always fun to see a AI copyright thread on HN - the same people that want copyright abolished suddenly want the strongest copyright ever to exist.