October 26th, 2024

Open washing – why companies pretend to be open source

Open washing is the practice of falsely claiming products as open source, particularly in AI. Major companies often do not meet open source criteria, threatening innovation and the development community's integrity.

Read original articleLink Icon
Open washing – why companies pretend to be open source

The term "open washing" refers to the practice where companies falsely claim their products or services are open source, despite not adhering to the principles of transparency and accessibility. This trend has gained traction, particularly in the AI sector, as companies like Meta promote their models, such as Llama 3, as open source, despite failing to meet the Open Source Initiative's (OSI) criteria. A study from Radboud University revealed that many prominent AI models from major companies, including Google and Microsoft, do not qualify as open source. The motivation behind open washing includes the desire to enhance public image and evade regulatory scrutiny, especially with the EU AI Act offering exemptions for genuinely open source models. The OSI is set to release a definition for open source AI, but current licenses often do not meet any established criteria. The implications of open washing extend beyond legal definitions; it threatens the integrity of open source development, complicating the use of code and potentially stifling innovation. Experts emphasize the importance of maintaining clear definitions to protect the open source ecosystem, as the misuse of the term can lead to broader issues affecting developers and businesses alike.

- Open washing is a deceptive practice where companies falsely claim their products are open source.

- Major AI models from companies like Meta, Google, and Microsoft often do not meet open source criteria.

- The EU AI Act provides incentives for companies to mislabel their models as open source.

- Clear definitions of open source are crucial to protect the integrity of the development community.

- Open washing can complicate code usage and hinder innovation in the tech industry.

Link Icon 28 comments
By @martin-t - 4 months
The second goal is muddying the waters and making people not care.

Say you're deciding between two programs (or AI models)[0], you prefer an open source one, a colleague prefers one that just pretends to be open. You say your choice is preferable because it's open, he says the same about his choice. Then you say the dreaded "well, actually" and either you sound like a fundamentalist or an asshole.

[0]: None of those are truly open source because they're all trained on stolen data. And see? Now I sound like a fundamentalist.

By @neilv - 4 months
Open source was always a corporate-friendly compromise, but seemed like some of the people involved had a lot of integrity.

What we need is those open source people with integrity to put the smack down on those willfully abusing and destroying the terms.

If you can't do it with trademarks/certifications/licensing/memberships/etc., do it with mainstream journalism. Like might be being done here, except The Register has long had rare insider knowledge, and is relatively niche. You need to get the message out to everyone who's not already in the know, including lawmakers.

(Incidentally, the FSF also has integrity, but, besides prompting open source by being zero-compromise -- which is fine in their case -- they have an additional challenge of seeming to be clinically incapable of advocacy in situations that are aligned.)

By @bubblesnort - 4 months
Open source never had any of the ethics or philosophy that free software has.

Free software > open source.

By @an_d_rew - 4 months
I have worked at multiple companies that vilified open source anything, while building their entire businesses on Linux, Java, Debian, and thousands of other "OSI Approved" software.

It's because, in my experience, the majority of businesses want to take but do not want to feel any obligation to give back or support.

By @mirekrusin - 4 months
True, this needs clarification that currently doesn't exist for large models where training costs heavy millions and binary artifact is both precious and malleable – unlike ordinary compilation.

Regardless if – once OSI establishes their definition(s) – Meta will choose path of adherence or not, they still deserve a paragraph of praise for what they're doing.

As a side note OSI should also recognize that in the era of giant cloud providers protection from predatory market participants is also a thing and should exist as clear licensing option. Mongo, Elastic and Redis drama could be avoided in the future if there was a clear option to protect author side sustainability without affecting open source spirit for end users.

ps. I also believe that "Open <something>" should be protected phrase similar to how "Police", "Federal", "Government" or "Organic" is protected to not mislead the public so we don't have things like "OpenAI" nonsense.

By @mdaniel - 4 months
I can more readily(?) accept ones which mis-label their announcements of "Open Source!!1 under My Awesome License 1.0beta" than I can rug-pulls. Look, if you wanna use some rights-harming license and just shit on the term "Open Source," that's bad, but from a certain perspective understandable if the marketing folks don't grok the nuances of Open Source. The world is filled with misguided people, and I can just command-w the window and never use your product

But if you accept contributions from the community for years, and ingrain your product in hundreds of thousands of workflows around the world, and only then decide "holy shit, salaries cost money, best yank our license" that should be a case of fraud and you should be civilly liable, in my opinion

By @kvemkon - 4 months
Related:

OSI readies controversial open-source AI definition (26.10.2024)

https://news.ycombinator.com/item?id=41951421

By @scirob - 4 months
an agregious example is thirdweb who technically has the product open sourced but is written to not work without an API key and phone home to SAAS to check your API call limit..

https://github.com/thirdweb-dev/engine?tab=readme-ov-file https://portal.thirdweb.com/engine/self-host

It makes me sad becuase I was working on a getting a team together to build a real opensource and free alternative but once they found thirdweb they all got discouraged thinking that no one will understand why our real open product is diffierent

By @Sytten - 4 months
Direct consequence IMO of our failure to popularize good licenses in another concept like fair source that sits in-between open source and closed source. My small non-saas bootstrap company could not survive if it was OSS, but maybe fair source.
By @lordofgibbons - 4 months
> The pair found that while a handful of lesser-known LLMs, such as AllenAI's OLMo and BigScience Workshop + HuggingFace with BloomZ could be considered open, most are not.

It's absolutely wild to think the deranged BigScience RAIL license, under which the Bloom LLM was released, is open in any way shape or form. It has more user-harming restrictions than basically any other LLM license out there.

By @meehai - 4 months
I think Open Weights is a better name for AI models that don't share the reproducible training scripts and data.
By @ahaucnx - 4 months
I believe often companies or rather decision makers are afraid of going fully open-source because they invested a lot of money into the product and are afraid some other company uses it, offers it cheaper and ultimately harms the originator.

So even they might believe in open-source they put protections in place that ultimately lock it down and thus make it closed source but trying to keep the impression of being open.

In our journey at AirGradient towards becoming fully open-source hardware (all code and hardware licensed under CC-BY-SA), we had the same concerns but ultimately decided to go full-in and open up everything with an officially approved open-source license.

I believe there are a few important aspects and "protections" that are open-source compatible that help companies protect their investments.

Firstly, requiring Attribution is compatible with open-source and can help companies get a lot of visibility and competitors probably don't want to attribute another company and thus are often not likely to clone.

Secondly, using a share-alike license also makes it unattractive for many other companies using the code.

Lastly, I believe the code itself is often not the valuable part compared to the brand value, employees, reputation, business model, network and implicit knowledge that a company builds up.

It really worked for us to go that way with a true open-source license and I hope many others will do it too.

There are already some easy to understand licenses like CC in place and I do hope that they also create awareness around "open washing".

By @simonw - 4 months
"Would it surprise you to know that according to the study, the big-name ones from Google, Meta, and Microsoft aren't? I didn't think so."

Microsoft has a decent LLM that I'd consider to be "open source": Phi-3.5, under the MIT license: https://huggingface.co/microsoft/Phi-3.5-vision-instruct

By @rvnx - 4 months
At the same time Facebook is doing some of the best efforts for open-AI, so it's a bit hard to blame them. They are not perfect but they still spent and shared the most important artifact that was created out of dozens of millions of USD spent (or even more), though not the dataset, but it is really a major advance forward.
By @rietta - 4 months
I attended the referenced talk by Dan Lorenc in Alpharetta this week. It was very interesting. He hammered on how many licenses flunk the OSI test despite claiming to be open source.
By @blackeyeblitzar - 4 months
It’s easy. They’re draining the phrase “open source” of meaning while gaining by marketing themselves that way. It’s fraudulent but also just exploitative.
By @gradientsrneat - 4 months
Article commenter points out that Meta is a funder of the OSI. We'll see if that affects how the OSI defines "open" AI models.

I find it funny how OpenAI was only indirectly mentioned. Still, I'm glad that this columnist is taking a principled stance by arguing aginst one of the more borderline cases.

By @stonethrowaway - 4 months
I’ve commented on these moves and jukes a few months ago. In the spirit of not reposting, the original is here: https://news.ycombinator.com/item?id=41090142
By @pabs3 - 4 months
I like Debian's policy for libre AI:

https://salsa.debian.org/deeplearning-team/ml-policy/

By @tzs - 4 months
> The Open Source Initiative (OSI) spells it out in the Open Source Definition, and Llama 3's license – with clauses on litigation and branding – flunks it on several grounds.

Anyone know specifically what he is talking about here?

The only things I'm seeing that I would consider to be clauses on litigation are one that terminates your license if you sue them claiming Llama 3 or its output violates your IP, and the have a choice of venue and choice of forum clause.

Several OSI approved licenses have "terminate on patent suit" clauses. Llama 3 is termination on IP suit rather than just on patent suit but I don't see anything in the OSD where that would make a difference.

There's stuff about trademarks, which I assume are the branding clauses he mentions. But I don't see anything obvious on the OSD that such clauses violate.

By @yieldcrv - 4 months
“open source” doesn't only mean what the “open source community” has memed it to mean

personally, I think the term should be avoided if its not what the open source community has made a culture around

but I cant say its weasely corporate “open washing”, either. because its the open source community that appropriated the term to mean a subset of free, open, commercial use licenses and everything digital thats necessary to replicate the product, not the other way around where corporations are suddenly using some legalese to turn it into a marketing term thats technically okay

By @alexashka - 4 months
They pretend because advertising and marketing is legal.
By @cranberryturkey - 4 months
heh. i've seen this a lot lately.
By @mlinksva - 4 months
> the EU still doesn't have a clear definition of open source AI

One can debate "clear" but the the AI Act https://eur-lex.europa.eu/eli/reg/2024/1689/oj does say in Recitals 102-104 (mini open source license definition *highlighted*):

---

(102) Software and data, including models, released under a free and open-source licence that allows them to be openly shared and where users can freely access, use, modify and redistribute them or modified versions thereof, can contribute to research and innovation in the market and can provide significant growth opportunities for the Union economy. General-purpose AI models released under free and open-source licences should be considered to ensure high levels of transparency and openness if their parameters, including the weights, the information on the model architecture, and the information on model usage are made publicly available. *The licence should be considered to be free and open-source also when it allows users to run, copy, distribute, study, change and improve software and data, including models under the condition that the original provider of the model is credited, the identical or comparable terms of distribution are respected.*

(103) Free and open-source AI components covers the software and data, including models and general-purpose AI models, tools, services or processes of an AI system. Free and open-source AI components can be provided through different channels, including their development on open repositories. For the purposes of this Regulation, AI components that are provided against a price or otherwise monetised, including through the provision of technical support or other services, including through a software platform, related to the AI component, or the use of personal data for reasons other than exclusively for improving the security, compatibility or interoperability of the software, with the exception of transactions between microenterprises, should not benefit from the exceptions provided to free and open-source AI components. The fact of making AI components available through open repositories should not, in itself, constitute a monetisation.

(104) The providers of general-purpose AI models that are released under a free and open-source licence, and whose parameters, including the weights, the information on the model architecture, and the information on model usage, are made publicly available should be subject to exceptions as regards the transparency-related requirements imposed on general-purpose AI models, unless they can be considered to present a systemic risk, in which case the circumstance that the model is transparent and accompanied by an open-source license should not be considered to be a sufficient reason to exclude compliance with the obligations under this Regulation. In any case, given that the release of general-purpose AI models under free and open-source licence does not necessarily reveal substantial information on the data set used for the training or fine-tuning of the model and on how compliance of copyright law was thereby ensured, the exception provided for general-purpose AI models from compliance with the transparency-related requirements should not concern the obligation to produce a summary about the content used for model training and the obligation to put in place a policy to comply with Union copyright law, in particular to identify and comply with the reservation of rights pursuant to Article 4(3) of Directive (EU) 2019/790 of the European Parliament and of the Council (40).

---

In the articles open-source is expressly referred to as release under an open-soruce license (see definition in recitals above):

---

[Article 2: Scope]

12. This Regulation does not apply to AI systems released under free and open-source licences, unless they are placed on the market or put into service as high-risk AI systems or as an AI system that falls under Article 5 or 50.

[Article 25: Responsibilities along the AI value chain]

4. The provider of a high-risk AI system and the third party that supplies an AI system, tools, services, components, or processes that are used or integrated in a high-risk AI system shall, by written agreement, specify the necessary information, capabilities, technical access and other assistance based on the generally acknowledged state of the art, in order to enable the provider of the high-risk AI system to fully comply with the obligations set out in this Regulation. This paragraph shall not apply to third parties making accessible to the public tools, services, processes, or components, other than general-purpose AI models, under a free and open-source licence.

[Article 54: Authorised representatives of providers of general-purpose AI models]

6. The obligation set out in this Article shall not apply to providers of general-purpose AI models that are released under a free and open-source licence that allows for the access, usage, modification, and distribution of the model, and whose parameters, including the weights, the information on the model architecture, and the information on model usage, are made publicly available, unless the general-purpose AI models present systemic risks.

By @nmstoker - 4 months
See also: "AI Washing".

Externally done to give a kick to sales efforts.

And internally done in an attempt to get someone with AI resources to build blatantly non-AI functions by sticking then onto something with no or very little genuine AI angle.

By @teddyh - 4 months
Cue the several weasels who regularly turn up, arguing that “Open Source” can mean whatever they say it means, since they don’t accept the OSI definition.