June 29th, 2024

All web "content" is freeware

Microsoft's CEO of AI discusses open web content as freeware since the 90s, raising concerns about AI-generated content quality and sustainability. Generative AI vendors defend practices amid transparency and accountability issues. Experts warn of a potential tech industry bubble.

Read original article

The article discusses a CNBC interview with Microsoft's CEO of AI, where he claims that content on the open web has been considered freeware since the 90s, allowing anyone to copy, recreate, or reproduce it. The interview highlights concerns about the sustainability and quality of AI-generated content, with indications that peak AI may be approaching. Generative AI vendors are defending their practices by arguing that everything is fair game. The author points out a shift in the perception of AI tools and emphasizes the lack of transparency and accountability in generative AI chatbots compared to search engines. The article concludes by suggesting that the tech industry may be entering a bubble as some experts start to believe their own narratives. The piece reflects on ethical and legal implications surrounding generative AI and the evolving landscape of online content creation.

OpenAI and Anthropic are ignoring robots.txt

Two AI startups, OpenAI and Anthropic, are reported to be disregarding robots.txt rules, allowing them to scrape web content despite claiming to respect such regulations. TollBit analytics revealed this behavior, raising concerns about data misuse.

The Encyclopedia Project, or How to Know in the Age of AI

Artificial intelligence challenges information reliability online, blurring real and fake content. An anecdote underscores the necessity of trustworthy sources like encyclopedias. The piece advocates for critical thinking amid AI-driven misinformation.

Jack Dorsey says we won't know what is real anymore in the next 5-10 years

Jack Dorsey and Elon Musk express concerns about AI-generated deep fakes blurring reality. OpenAI's efforts fall short, emphasizing the importance of human intervention in maintaining authenticity amidst AI advancements.

Microsoft's AI boss Suleyman has a curious understanding of web copyright law

Microsoft's AI boss, Mustafa Suleyman, suggests open web content is free to copy, sparking copyright controversy. AI firms debate fair use of copyrighted material for training, highlighting legal complexities and intellectual property concerns.

Microsoft says that it's okay to steal web content it because it's 'freeware.'

Microsoft's CEO of AI, Mustafa Suleyman, believes web content is "freeware" for AI training unless specified otherwise. This stance has sparked legal disputes and debates over copyright infringement and fair use in AI content creation.

26 comments

By @bdw5204 - 11 months

This statement from Microsoft is just asking for a copyright infringement lawsuit because the courts have been very clear that web "content" is copyrighted unless it is explicitly placed in the public domain or old enough to no longer be under copyright.

Authors of open source code should consider adding explicit restrictions to their license barring the use of their code to train AI. This would make it easier to file lawsuits against Microsoft and others of their ilk who think they can train their AI with other people's work without fair compensation.

By @charonn0 - 11 months

> Anyone can copy it, recreate with it, reproduce with it

He seems to be confusing "freeware", which is basically a license for copyrighted work, with "public domain", which is the absence of a copyright.

By @bdcravens - 11 months

> Perhaps that’s why he bookended his claims with “since the 90s”

No, it's because the web has existed since 1991. (Though for the puritans, the paper was written in 1989 and the first browser was developed in 1990)

https://www.npr.org/2021/08/06/1025554426/a-look-back-at-the...

By @croes - 11 months

I bet if any other company did it instead of MS they would sue the hell out of them for using their data.

By @boesboes - 11 months

Without trying to take a stance on this, I do have to say I like the FastGPT feature that comes with Kagi. It basically does a search and uses those results to answer questions.

Now I'd just want it to have a better UI with history and some sort of notebook mode instead of chat. I'm not sure how, but I don't want to chat with AI, I want a different way to 'instruct' it.

By @tjpnz - 11 months

I intend to use Mustafa Suleyman's likeness and name for my next project. It's part comic book/part novel and tells the story of a socially awkward tech CEO getting way out of his comfort zone by moonlighting as a male porn star. It ends with an OJ Simpson style police chase when it's discovered that Mustafa has been embezzling funds to support a drug habit and addiction to plastic surgery.

By @scotty79 - 11 months

> But that means torrents of Windows are freeware!

For many, many years now, if you need Windows you can just download it from Microsoft and run simple, non-intrusive activation procedure (not from Microsoft) after installation. No cracks needed. As much security as hip high front porch gate.

So even for MS the understanding was that these things are de facto freeware for anyone that wants them at all.

By @Sophira - 11 months

Has everyone forgotten the furore that was Cook's Source Magazine stealing a recipe that was published online?

https://yro.slashdot.org/story/10/11/04/1940257/cooks-magazi...

By @kkfx - 11 months

I agree, so please Microsoft shut you mouth if I grab your maps, wrap your services and so on, because they are web-based so I am free to do whatever I like with them, relevant licenses does not count.

By @ralferoo - 11 months

More discussion on similar article: https://news.ycombinator.com/item?id=40828438

By @wooptoo - 11 months

> search engines link to their sources! Chatbots don’t.

Actually Copilot does provide links to its sources, which adds credibility and promotes further exploration.

By @fundad - 11 months

How did they train auto-completers or classifiers if they didn’t train on the open web? How did Pandora train if not on copyrighted music?

By @seoulmetro - 11 months

It's true. People don't like it, but it's true.

If you provide content you created online for free, that content is now freeware.

If someone provides content that they didn't create that still has copyright restrictions in real life, that isn't freeware.

It's like all the photos uploaded to Facebook and Instagram are now free to use however the downloader wants (and Meta as well of course). It's true. But people don't like it.

By @scotty79 - 11 months

> Don’t blame us, the Torment Nexus is established practice!

Well, it is. And I for one, am absolutely delighted that some people with money finally have an incentive to accept that after three decades of copyright death throes.

By @Almondsetat - 11 months

Copilot links to its sources. The author should reconsider having this blatantly false and easily verifiable article up on their website

By @namds - 11 months

Now that we have established that Microsoft information wants to be free, my next project is wget.ai:

wget.ai is a sophisticated real time LLM that trains itself while downloading "content". Like any LLM, it predicts the next output token (byte in this case) based on the statistical training. wget.ai is run at temperature zero. In this revolutionary setting it has arrived at the conclusion that the most likely output byte equals the input byte!

Armed with this theorem, wget.ai can transform and replicate a Windows 11 download in real time. No copying is involved, the advanced algorithms happen to arrive at input == output.

Users of Windows 11 can download activation keys (freeware) from the Internet.

By @edent - 11 months

I like the fact that I can now reproduce any Microsoft content without paying for it. Cheers!

Incidentally, some AI chatbots do link to their sources. And it is a good idea to make that an explicit prompt if you're using one that doesn't. It's also worth prompting for how recent their information is.

By @rchaud - 11 months

DRM and paywalls for thee, industrial-scale scraping for me. /s

It's time for us to build our own miniature versions of Internet Archive with the content that is personally important to us . The powers that be will take it down under the guise of defending copyright, while the bigcos continue to suck up every letter of every page that has a publicly available URL.

By @jampekka - 11 months

I find it good that the concept of IP is collapsing, but this shows clearly the corporate dishonesty around it. For decades corporate sites and APIs have pushed all sorts of illegal EULAs and ToSs in attempt to e.g. ban scraping. Now suddenly all of this is scrapped, with of course no explanations given as to why.

By @sublinear - 11 months

In a world where physical media is no longer relevant and everything is on the internet, what the hell is "web content"?

By @prmoustache - 11 months

So basically I can create a whinedows website with microsoft windows logo on it right?

All web "content" is freeware

Related

OpenAI and Anthropic are ignoring robots.txt

The Encyclopedia Project, or How to Know in the Age of AI

Jack Dorsey says we won't know what is real anymore in the next 5-10 years

Microsoft's AI boss Suleyman has a curious understanding of web copyright law

Microsoft says that it's okay to steal web content it because it's 'freeware.'

Related

OpenAI and Anthropic are ignoring robots.txt

The Encyclopedia Project, or How to Know in the Age of AI

Jack Dorsey says we won't know what is real anymore in the next 5-10 years

Microsoft's AI boss Suleyman has a curious understanding of web copyright law

Microsoft says that it's okay to steal web content it because it's 'freeware.'