July 9th, 2024

Judge dismisses DMCA copyright claim in GitHub Copilot suit

A judge dismissed a DMCA claim against GitHub, Microsoft, and OpenAI over Copilot. The lawsuit alleged code suggestions lacked proper credit. Remaining claims involve license violation and breach of contract. Both sides dispute document production.

Read original articleLink Icon
Judge dismisses DMCA copyright claim in GitHub Copilot suit

A judge has dismissed a Digital Millennium Copyright Act (DMCA) claim in a lawsuit against GitHub, Microsoft, and OpenAI regarding the Copilot coding assistant. The lawsuit, filed by developers, alleged that Copilot was suggesting code snippets from open-source projects without proper credit, violating intellectual property rights. The judge ruled that the code suggested by Copilot was not identical enough to the developers' work for the DMCA claim to apply. The case started with 22 claims, but most have been dismissed, leaving only two standing: an open-source license violation allegation and a breach of contract complaint. Both sides have been disputing document production during the discovery process. GitHub, Microsoft, and OpenAI maintain that Copilot adheres to laws and promotes responsible innovation in AI-powered software development. The plaintiffs argue that Copilot could generate identical code and have raised concerns about the handling of documents in the case.

Related

Microsofts AI boss thinks its perfectly OK to steal content if its on open web

Microsofts AI boss thinks its perfectly OK to steal content if its on open web

Microsoft's AI boss, Mustafa Suleyman, challenges copyright norms by advocating for free use of online content. His stance triggers debates on AI ethics and copyright laws in the digital era.

The Center for Investigative Reporting Is Suing OpenAI and Microsoft

The Center for Investigative Reporting Is Suing OpenAI and Microsoft

The Center for Investigative Reporting (CIR) sues OpenAI and Microsoft for copyright infringement, alleging unauthorized use of stories impacting relationships and revenue. Legal action mirrors media concerns over tech companies using journalistic content without permission.

OpenAI Wants New York Times to Show How Original Its Copyrighted Articles Are

OpenAI Wants New York Times to Show How Original Its Copyrighted Articles Are

OpenAI requests New York Times' materials for copyright assessment amid infringement claim. Times objects to broad approach, fearing chilling effect. Legal battle showcases AI-copyright tension.

Coders' Copilot code-copying copyright claims crumble against GitHub, Microsoft

Coders' Copilot code-copying copyright claims crumble against GitHub, Microsoft

A judge dismissed a DMCA claim against GitHub, Microsoft, and OpenAI over Copilot. Remaining are claims of license violation and breach of contract. Dispute ongoing regarding discovery process. Defendants defend Copilot's compliance with laws.

OpenAI pleads it can't make money with o using copyrighted material for free

OpenAI pleads it can't make money with o using copyrighted material for free

OpenAI requests British Parliament to permit copyrighted material for AI training. Facing legal challenges from NYT and Authors Guild for alleged copyright infringement. Debate impacts AI development and copyright protection, raising concerns for content creators.

Link Icon 72 comments
By @munificent - 6 months
> Indeed, last year GitHub was said to have tuned its programming assistant to generate slight variations of ingested training code to prevent its output from being accused of being an exact copy of licensed software.

If I, a human, were to:

1. Carefully read and memorize some copyrighted code.

2. Produce new code that is textually identical to that. But in the process of typing it up, I randomly mechanically tweak a few identifiers or something to produce code that has the exact same semantics but isn't character-wise identical.

3. Claim that as new original code without the original copyright.

I assume that I would get my ass kicked legally speaking. That reads to me exactly like deliberate copyright infringement with willful obfuscation of my infringement.

How is it any different when a machine does the same thing?

By @daedrdev - 6 months
> The anonymous programmers have repeatedly insisted Copilot could, and would, generate code identical to what they had written themselves, which is a key pillar of their lawsuit since there is an identicality requirement for their DMCA claim. However, Judge Tigar earlier ruled the plaintiffs hadn't actually demonstrated instances of this happening, which prompted a dismissal of the claim with a chance to amend it.

It sounds fair from how the article describes it

By @bityard - 6 months
This is pretty interesting, and I have conflicted feelings about the (seemingly obvious) outcome of this trial.

I wonder, if MS and OpenAI win, does that mean it will be legal for anyone to take the leaked source code for a proprietary product, train an LLM on it, and then ask the LLM to emit a version of it that is different enough to avoid copyright infringement?

That would be quite the double-edged sword for proprietary software companies.

By @hn_throwaway_99 - 6 months
A slight aside, but this is the subtitle:

> A few devs versus the powerful forces of Redmond – who did you think was going to win?

I hate that kind of obnoxious "journalism". Sometimes the little guy is actually wrong. To clarify, I'm not commenting on the specifics of this case, I just hate how fake our online discourse has been by appealing to "big guy evil" before even bringing up the specifics of the case.

By @mvdtnz - 6 months
What were the plaintiffs even thinking when they submitted a claim based on identicality without being able to produce a single instance of copilot generating a verbatim copy. Even the research they submitted was unable to make a claim any stronger than "it's possibly in theory but we've never seen it".
By @epolanski - 6 months
I am not strongly opinionated on this, but the very fact Microsoft used all the code it could find, bar their own has always looked suspicious to me.
By @lumb63 - 6 months
It seems to me that regardless of the outcome of this case, some developers do not want to have their code used to train LLMs. There may need to be a new license created to restrict this usage of software. Or, maybe developers will simply stop contributing open source. In today’s day and age, where open source code serves as a tool to pad Microsoft’s pockets, I certainly will not publish any of my software open source, despite how much I would like to (under GPL) in order to help fellow developers.

If I were Microsoft, I’d really be concerned that I’m going to kill my golden goose by causing a large-scale exodus from GitHub or open source development more generally. Another idea I’ve considered is publishing boatloads of useless or incorrect code to poison their training data.

As I see it, people should be able to restrict how people use something that they gave them. If some people prefer that their code is not used to train LLMs, there should be a way to enforce that.

By @perlgeek - 6 months
From the article:

> The anonymous programmers have repeatedly insisted Copilot could, and would, generate code identical to what they had written themselves, which is a key pillar of their lawsuit since there is an identicality requirement for their DMCA claim. However, Judge Tigar earlier ruled the plaintiffs hadn't actually demonstrated instances of this happening, which prompted a dismissal of the claim with a chance to amend it.

So, the problem is really one of the lack of evidence, which seems... like a pretty basic mistake from the plaintiffs?

They could've taken a screencap video back when Copilot still produced code more verbatim, and used that as evidence, I assume.

By @bsza - 6 months
Should we move to modified versions of FOSS licenses that forbid AI training?

Found this: https://github.com/non-ai-licenses/non-ai-licenses

Legally sound or not, these should at least prevent your code from being included in Copilot's training data, hopefully without affecting any other use case. I'm going to use one of these next time I start a new project.

By @cellis - 6 months
I would like to ask an obvious question to the legally inclined here. How is this any different than remixing a song (lyrics/audio)? It's not "identical", and doesn't output "verbatim" lyrics or audio. What is the distinction between <LLM> and <Singer/Remixer who outputs remixed lyrics/audio>. By a quick Google search it seems remixes violate copyright.
By @MagicMoonlight - 6 months
The issue I have is that these models are inherently trained to duplicate stuff. You train them by comparing the output to the original.

If I made an “advanced music engine” which rips Taylor swift files and duplicates them, I would be sued to oblivion. Why does calling it an AI suddenly fix that?

They should have to train them on information they legally own.

By @snvzz - 6 months
All GitHub needs to do to make most happy is offer an opt-out toggle.

It still doesn't.

By @slicktux - 6 months
Yet people keep feeding it their code by using GitHub as their repo… Just how we use the internet to share information; there’s just no escaping it.
By @passwordoops - 6 months
"The lack of documents from the Windows maker is apparently down to "technical difficulties" in collecting Slack messages"

Wait, I'm forced to use Teams at work but Microsoft employees are on Slack?!

By @yazzku - 6 months
> The judge disagreed, however, on the grounds that the code suggested by Copilot was not identical enough to the developers' own copyright-protected work, and thus section 1202(b) did not apply.

How did they reach this conclusion? How can you prove that it never copies a code snippet verbatim, versus just showing that it does for one specific code snippet? The latter is a lot easier to show, but I don't know what is it exactly that the prosecution claimed. I guess the size of the copy also matters in copyright violations?

By @nashashmi - 6 months
Big question: this thing called “training” AI off of data, how much of this is “training” and how much of this is “synthesizing”? It seems like if code is being copied and rephrased, it is synthetic. Not much “learning” and “training” going on here.
By @loceng - 6 months
This kind of argument makes me feel like it also supports the abolition of patents: eventually multiple other people will come up with the same obvious solution, which becomes obvious once a person spends enough time looking at a problem.
By @purpleblue - 6 months
Can you insist or put instructions that AIs do not train on your code? If they train on your code but don't produce the exact same output, is there any protection you can have from that?
By @chrismsimpson - 6 months
If this is how the law is applied for code, are we to expect this is also how it will be applied for other data (e.g. audio a la Udio and Suno)?
By @albertTJames - 6 months
Looking good ! Go Copilot !
By @rolph - 6 months
copilot was apparently snipping license bearing comments, and applying "semantic" variations of the remaining code.

i would package the entire code as a series of comments, [ideally this would be snipped by the pliagarists] leaving a snippet of example code that no one of sound mind would allow to execute, being proffered by copilot.

By @WesternWind - 6 months
Wait... So Microsoft doesn't use Microsoft Teams, it uses Slack?
By @sagarpatil - 6 months
Off topic: How does the judiciary decide which judge to choose for such highly technical case?
By @Tomte - 6 months
That‘s Matthew Butterick‘s case.
By @chidli1234 - 6 months
Microsoft has deep pockets. Judges aren't objective. More at 11.
By @nancyp - 6 months
Linux/OSS is cancer. Said who? Anything in public domain is for grab by them.

Until the open tech community is chicken enough to not boycott their no open source stuff such as github and linked in a proof nothing will happen.

By @pledess - 6 months
I thought "the Copilot coding assistant was trained on open source software hosted on GitHub and as such would suggest snippets from those public projects to other programmers without care for licenses" was explicitly allowed by the GitHub Terms of Service: https://docs.github.com/en/site-policy/github-terms/github-t... "If you set your pages and repositories to be viewed publicly, you grant each User of GitHub a nonexclusive, worldwide license to use, display, and perform Your Content through the GitHub Service." In other words, in addition to what's allowed by the LICENSE file in your repo, you are also separately licensing your code "to use ... through the GitHub Service" and this would (in my interpretation) include use by Copilot for training, and use by Copilot to deliver snippets to any other GitHub user.