Microsofts AI boss thinks its perfectly OK to steal content if its on open web
Microsoft's AI boss, Mustafa Suleyman, challenges copyright norms by advocating for free use of online content. His stance triggers debates on AI ethics and copyright laws in the digital era.
Read original articleMicrosoft's AI boss, Mustafa Suleyman, has sparked controversy by suggesting that content on the open web is fair game for anyone to copy and use freely, dubbing it "freeware." This belief contradicts copyright law, as creating content automatically grants copyright protection in the US. Suleyman's stance comes amidst lawsuits accusing Microsoft and OpenAI of using copyrighted online content to train AI models. While he acknowledges the importance of specifying restrictions in a robots.txt file, he downplays its legal weight compared to fair use. Suleyman's comments have raised concerns about the ethical use of AI and intellectual property rights. Despite the prevalence of AI companies justifying the use of copyrighted material under fair use, Suleyman's bold assertions have drawn attention to the complexities of copyright law in the digital age. The debate surrounding AI's access to and use of online content continues to evolve, with legal and ethical implications at the forefront of discussions in the tech industry.
Related
Microsoft's AI boss Suleyman has a curious understanding of web copyright law
Microsoft's AI boss, Mustafa Suleyman, suggests open web content is free to copy, sparking copyright controversy. AI firms debate fair use of copyrighted material for training, highlighting legal complexities and intellectual property concerns.
Microsoft says that it's okay to steal web content it because it's 'freeware.'
Microsoft's CEO of AI, Mustafa Suleyman, believes web content is "freeware" for AI training unless specified otherwise. This stance has sparked legal disputes and debates over copyright infringement and fair use in AI content creation.
Microsoft CEO of AI Your online content is 'freeware' fodder for training models
Mustafa Suleyman, CEO of Microsoft AI, faced legal action for using online content as "freeware" to train neural networks. The debate raises concerns about copyright, AI training, and intellectual property rights.
All web "content" is freeware
Microsoft's CEO of AI discusses open web content as freeware since the 90s, raising concerns about AI-generated content quality and sustainability. Generative AI vendors defend practices amid transparency and accountability issues. Experts warn of a potential tech industry bubble.
Microsoft AI CEO: Web content is 'freeware'
Microsoft's CEO discusses AI training on web content, emphasizing fair use unless restricted. Legal challenges arise over scraping restrictions, highlighting the balance between fair use and copyright concerns for AI development.
> I think that with respect to content that’s already on the open web, the social contract of that content since the ‘90s has been that it is fair use. Anyone can copy it, recreate with it, reproduce with it. That has been “freeware,” if you like, that’s been the understanding.
> There’s a separate category where a website, or a publisher, or a news organization had explicitly said ‘do not scrape or crawl me for any other reason than indexing me so that other people can find this content.’ That’s a grey area, and I think it’s going to work its way through the courts.
This is like arguing that this guy who just murdered someone 10 minutes ago, should actually be able to steal the candy from this child since the child put it down on the park bench.
robots.txt is a "grey idea" to him, instead of being a directive to keep moving? Wow.
May be, there should be a similar amount of openness in publishing the content used for training commercial models.
The copyright owner should have a privilege to ask for that content to be removed from training. This may also allow individual authors to gain their share with their Advanced RAG applications, that are specially focussed on the content they own and also published on the web.
Another thing is the copyright of the content, terms of use policies, etc.
Abiding by a robots.txt policy doesn't make you immune to copyright, terms of service, law in various jurisdictions, etc. If you think that you are probably a kleptomaniac.
Just create a robots.txt with "User-Agent: one billion asterisks" so that the crawlers die when parsing it.
Programs from my youth (Daria, Captain N) had licensed music for their broadcast, and that’s all because what else was ever going to be done? 20 years later, streaming with the music intact is an impossibility because the kind of money necessary to license all of it was too much. And you have to make deals with dozens of companies.
Multiply that by several orders of magnitude and you start to see the scope of the problem.
Some people think of it as billboards posted on the highway. Some think it’s a bulletin board. Some think it’s a newspaper. A television, a “zine”, a diary, graffiti. It has been all of these things, and is and isn’t. And people who publish are really bad at explicitly stating which one they are. But they expect you to know.
It is classified as fair use, the term is transformative use, where those using it are training models (their intention) if anyone wishes to Google it.
The end.
[1]: or even just an issue
That's how society falls.
The four factors of fair use - purpose of use, nature of the copyrighted work, amount used, and effect on the market - overwhelmingly favor allowing free use of openly published web content. The transformative nature of most reuses, the public availability of the original works, the necessity of using entire works in many cases, and the lack of a traditional market for such content all support this interpretation.
This longstanding practice has been the catalyst for unprecedented innovation and information dissemination. It represents a tacit social contract between content creators and users, establishing a de facto "freeware" model for open web content. Any attempt to retroactively impose strict copyright limitations would not only stifle innovation but also contradict decades of established legal precedent and digital norms.
-As a side note, I’m not certain that training necessarily involves “copying.”
—-Lastly, if anyone really thinks the Robert’s court is going to knee-cap AI, you’re soft in the head.
Related
Microsoft's AI boss Suleyman has a curious understanding of web copyright law
Microsoft's AI boss, Mustafa Suleyman, suggests open web content is free to copy, sparking copyright controversy. AI firms debate fair use of copyrighted material for training, highlighting legal complexities and intellectual property concerns.
Microsoft says that it's okay to steal web content it because it's 'freeware.'
Microsoft's CEO of AI, Mustafa Suleyman, believes web content is "freeware" for AI training unless specified otherwise. This stance has sparked legal disputes and debates over copyright infringement and fair use in AI content creation.
Microsoft CEO of AI Your online content is 'freeware' fodder for training models
Mustafa Suleyman, CEO of Microsoft AI, faced legal action for using online content as "freeware" to train neural networks. The debate raises concerns about copyright, AI training, and intellectual property rights.
All web "content" is freeware
Microsoft's CEO of AI discusses open web content as freeware since the 90s, raising concerns about AI-generated content quality and sustainability. Generative AI vendors defend practices amid transparency and accountability issues. Experts warn of a potential tech industry bubble.
Microsoft AI CEO: Web content is 'freeware'
Microsoft's CEO discusses AI training on web content, emphasizing fair use unless restricted. Legal challenges arise over scraping restrictions, highlighting the balance between fair use and copyright concerns for AI development.