August 28th, 2024

Judge dismisses majority of GitHub Copilot copyright claims

A judge dismissed most copyright claims against GitHub, Microsoft, and OpenAI regarding GitHub Copilot, leaving two active claims while rejecting DMCA violations and preventing refiling of dismissed claims.

Read original article

Judge dismisses majority of GitHub Copilot copyright claims

A judge has dismissed most of the copyright claims in a lawsuit against GitHub, Microsoft, and OpenAI regarding the AI-powered coding assistant, GitHub Copilot. The lawsuit, initiated by developers in 2022, originally included 22 claims alleging copyright violations. Judge Jon Tigar's ruling, which was unsealed recently, leaves only two claims active: one concerning an open-source license violation and another related to breach of contract. The court dismissed the primary allegation that GitHub Copilot violated the Digital Millennium Copyright Act (DMCA) by suggesting code without proper attribution. The judge found the developers' arguments unconvincing, stating that the code in question was not sufficiently similar to the original works and noted that GitHub Copilot rarely reproduces memorized code. Consequently, the judge dismissed the DMCA claim with prejudice, preventing the developers from refiling it. Additionally, requests for punitive damages and monetary relief were also dismissed. Despite this ruling, the legal battle continues with the remaining claims likely to proceed through litigation, highlighting the ongoing legal complexities surrounding AI coding assistants and their training on existing codebases.

- Most copyright claims against GitHub Copilot have been dismissed.

- Only two claims remain: one for open-source license violation and one for breach of contract.

- The judge found the arguments regarding DMCA violations unconvincing.

- The ruling prevents the developers from refiling the dismissed claims.

- The case underscores the legal challenges faced by AI-powered coding tools.

Coders' Copilot code-copying copyright claims crumble against GitHub, Microsoft

A judge dismissed a DMCA claim against GitHub, Microsoft, and OpenAI over Copilot. Remaining are claims of license violation and breach of contract. Dispute ongoing regarding discovery process. Defendants defend Copilot's compliance with laws.

Judge dismisses DMCA copyright claim in GitHub Copilot suit

A judge dismissed a DMCA claim against GitHub, Microsoft, and OpenAI over Copilot. The lawsuit alleged code suggestions lacked proper credit. Remaining claims involve license violation and breach of contract. Both sides dispute document production.

The developers suing over GitHub Copilot got dealt a major blow in court

A California judge dismissed most claims in a lawsuit against GitHub, Microsoft, and OpenAI over code copying by GitHub Copilot. Only two claims remain: open-source license violation and breach of contract. The court ruled Copilot didn't violate copyright law.

Judge dismisses lawsuit over GitHub Copilot coding assistant

A US judge dismissed a lawsuit against GitHub over AI training with public code. Plaintiffs failed to prove damages for breach of contract. GitHub Copilot faces scrutiny for using open-source code.

GitHub Copilot is not infringing your copyright

GitHub Copilot, an AI tool, faces controversy for using copyleft-licensed code for training. Debate surrounds copyright infringement, AI-generated works, and implications for tech industry and open-source principles.

25 comments

By @KyleBerezin - 8 months

I will throw in a random story here about chat gpt 4.0. I'm not commenting on this article directly, just a somewhat related anecdote. I was using chatgpt to help me write some android opengl rendering code. OpenGL can be very esoteric and I haven't touched it for at least 10 years.

Everything was going great and I had a working example, so I decided to look online for some example code to verify I was doing things correctly, and not making any glaring mistakes. It was then that I found an exact line by line copy of what chat gpt had given me. This was before it had the ability to google things, and the code predated openAI. It had even brought across spelling errors in the variables, the only thing it changed was it translated the comments from Spanish to English.

I had always been under the impression that chat gpt just learned from sources, and then gave you a new result based roughly on its sources. I think some of the confounding variables here were, 1. this was a very specific use case and not many examples existed, and 2. all opengl code looks similar, to a point.

The worst part was, there was no license provided for the code or the repo, so it was not legal for me to take the code wholesale like that. I am now much more cautious about asking chat gpt for code, I only have it give me direction now, and no longer use 'sample code' that it produces.

By @nl - 8 months

The original reporting has more details: https://www.developer-tech.com/news/judge-dismisses-majority...

In particular this:

An amended version of the complaint had taken issue with GitHub’s duplication detection filter, which allows users to “detect and suppress” Copilot suggestions matching public code on GitHub.

The developers argued that turning off this filter would “receive identical code” and cited a study showing how AI models can “memorise” and reproduce parts of their training data, potentially including copyrighted code.

However, Judge Tigar found these arguments unconvincing. He determined that the code allegedly copied by GitHub was not sufficiently similar to the developers’ original work. The judge also noted that the cited study itself mentions that GitHub Copilot “rarely emits memorised code in benign situations.”

I think this is the key point: reproduction is the issue, not training. And as noted in the study[1] reproduction doesn't usually happen unless you go to extra lengths to make it.

[1] Not sure but maybe https://dl.acm.org/doi/abs/10.1145/3597503.3639133? Can anyone find the filing?

By @darby_nine - 8 months

Huh I guess you can just avoid legal liability by laundering through a chatbot

By @austin-cheney - 8 months

The comments seem to misunderstand copyright. Copyright protects a literal work product from unauthorized duplication and nothing else. Even then there are numerous exceptions like fair use and personal backups.

Copyright does not restrict reading a book or watching a movie. Copyright also does not restrict access to a work. It only restricts duplication without express authorization. As for computer data the restricted duplication typically refers to dedicated storage, such as storage on disk as opposed to storage in CPU cache.

When Viacom sued YouTube for $1.6 billion they were trying to halt the public from accessing their content on YouTube. They only sued YouTube, not YouTube users, and only because YouTube stored Viacom IP without permission.

By @maronato - 8 months

The judge argues that copilot “rarely emits memorised code in benign situations”, but what happens when it does? It is bound to happen some day, and when it does would I be breaching copyright by publishing the code copilot wrote? Just a few weeks ago a very similar suit for stable diffusion had its motion to dismiss copyright infringement claims denied. https://arstechnica.com/tech-policy/2024/08/artists-claim-bi...

By @ChrisArchitect - 8 months

Misleading OP,

Discussion from July:

Judge dismisses DMCA copyright claim in GitHub Copilot suit

https://news.ycombinator.com/item?id=40919253

By @AnimalMuppet - 8 months

Interesting. The parts that survived are the contract claims and the open-source license claims.

Contract is understandable - it supersedes almost everything else. If the law says I can do X but the contract says I can't, then I almost certainly can't.

It's nice to see open-source licenses being treated as having somewhat similar solidness as a contract.

By @panic - 8 months

If you have access to the Copilot weights, you should consider leaking them. We shared our code with you because we wanted it to be free, not sold back to us at $10/month.

By @jsyang00 - 8 months

> leaves only two claims standing: one accusing the companies of an open-source license violation and another alleging breach of contract

these seem to be major claims?

By @23B1 - 8 months

https://sfconservancy.org/GiveUpGitHub/

I was lucky to learn early-on that publishing important things to the web meant relinquishing control of not just the IP, but my own agency and fate. The cost far exceeded the benefits of generosity, be it contributions to FOSS, public blogging or documentation, or even just writing.

Time is the only fixed resource, and mine is proprietary, exclusive, and for sale to the highest bidder.

By @nadermx - 8 months

The purpose of Copyright is to promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.

"Sciences" refers not only to fields of modern scientific inquiry but rather to all knowledge

The hacker ethic is a philosophy and set of moral values within hacker culture. Practitioners believe that sharing information and data with others is an ethical imperative

hrmmm...

By @seanw444 - 8 months

Where do we draw the line between AI learning from codebases to offer code solutions, and humans learning from codebases to offer code solutions?

By @btown - 8 months

(July 10, 2024)

By @tidenly - 8 months

A lot of people dislike LLMs and generative AI (fairly) and are reflexively trying to reach for tools in our legal framework, claiming it's obviously already illegal. I don't think this is going to work. Generative AI is quite obviously novel to anyone who isn't in denial - and claiming existing copyright laws are going to cover it seems like a lost cause.

We need new laws. Especially regarding deepfakes, it's shocking how many people think revenge porn laws and such are going to be enough here. Rather than just focusing on the data usage, we need more fundamental laws and rights, like the right to control representations of ourselves, like Japan has, where producing images or voice/video in your likeness is prosecutable straight out. Likewise we need laws that explicitly target data use for training that is separate to copyright.

The way LLMs are trained is obviously too similar to how humans learn, and the transformation and then output produce works that are novel based on that "learning", just like humans do. This is so fundamentally different to what copyright laws were made to cover, I find it infuriating how many people handwave these arguments away. Only in perfect 1-to-1 regurgitation does it even feel close to something copyright would be able to cover.

By @PaulKeeble - 8 months

The consequence of all the abuse of the intent of open source licenses has just resulted in me not writing any open source code. I have a lot less issues with a code generator trained on GPl code that produces GPL code with the LLM being under GPL as well. Its the commercial licensing and paying for it that seems to breach the intent of these licenses to me.

I guess Microsoft has gotten what it wanted and has got to the extinguish stage of its plan for open source finally and all it needed was a chatbot.

By @Palmik - 8 months

Curious to see if the same will apply to other materials like news, books, images, music, movies, etc.

By @robswc - 8 months

I honestly just don't see how all this will work legally, in the future.

I don't know anything an LLM (or "AI") can do that a human couldn't, with enough time. If it can get a human in trouble, it should get the operators of the AI in trouble too. Likewise, if a human can do it, I don't see why an AI is any different.

By @slowhadoken - 8 months

I’ve heard corporate types call open source projects “security risks” and “commie nonsense” but it does stop them from trying to acquire the work for free to profit off of it. It’s greedy and duplicitous. It’s capture.

By @beeboobaa3 - 8 months

Guess microsoft paid them off

By @InDubioProRubio - 8 months

Finally, the great IP washing machine hums and can dissolve the whole structure. Bring forth your disassembly, to generate a draft, to re-generate clean source code. Cooperate-communism! It is done!

By @coding123 - 8 months

Its always fun to see a AI copyright thread on HN - the same people that want copyright abolished suddenly want the strongest copyright ever to exist.

Coders' Copilot code-copying copyright claims crumble against GitHub, Microsoft

Judge dismisses DMCA copyright claim in GitHub Copilot suit

The developers suing over GitHub Copilot got dealt a major blow in court

Judge dismisses lawsuit over GitHub Copilot coding assistant

A US judge dismissed a lawsuit against GitHub over AI training with public code. Plaintiffs failed to prove damages for breach of contract. GitHub Copilot faces scrutiny for using open-source code.

Judge dismisses majority of GitHub Copilot copyright claims

Related

Coders' Copilot code-copying copyright claims crumble against GitHub, Microsoft

Judge dismisses DMCA copyright claim in GitHub Copilot suit

The developers suing over GitHub Copilot got dealt a major blow in court

Judge dismisses lawsuit over GitHub Copilot coding assistant

GitHub Copilot is not infringing your copyright

Related

Coders' Copilot code-copying copyright claims crumble against GitHub, Microsoft

Judge dismisses DMCA copyright claim in GitHub Copilot suit

The developers suing over GitHub Copilot got dealt a major blow in court

Judge dismisses lawsuit over GitHub Copilot coding assistant

GitHub Copilot is not infringing your copyright