October 28th, 2024

Open-source AI must reveal its training data, per new OSI definition

The Open Source Initiative has defined "open" AI, requiring disclosure of training data and code, challenging companies like Meta, whose Llama model does not comply, amid ongoing debates on open-source values.

Read original articleLink Icon
Open-source AI must reveal its training data, per new OSI definition

The Open Source Initiative (OSI) has established a new definition for "open" artificial intelligence, which mandates that AI systems disclose their training data, code, and settings. This definition poses a challenge to companies like Meta, whose Llama model does not meet these criteria due to restrictions on commercial use and lack of access to training data. Meta has expressed disagreement with OSI's definition, arguing that it oversimplifies the complexities of modern AI. The OSI's definition aims to combat "open washing," where companies falsely claim their products are open source. The initiative took two years to develop, involving consultations with experts in various fields. Critics suggest that Meta's reluctance to share training data stems from a desire to protect its competitive edge and minimize legal risks associated with copyright issues. OSI's executive director noted parallels between Meta's current stance and Microsoft's historical resistance to open-source principles. The ongoing debate highlights the tension between traditional open-source values and the evolving landscape of AI technology.

- OSI's new definition requires AI systems to disclose training data, code, and settings.

- Meta's Llama model does not comply with OSI's open-source standards.

- The definition aims to prevent companies from misrepresenting their products as open source.

- Critics argue that Meta's data restrictions are motivated by competitive and legal concerns.

- The discussion reflects broader tensions between open-source principles and AI development.

Link Icon 4 comments
By @wruza - 4 months
OSI is not an “open-source” trademark holder though. It’s basically an opinion, cause current OSI-approved licenses do not include this exlicitly.

I guess I’m built different, but all this open-washing noise makes no sense to me. I don’t think “oh, open-free as in both” every time I see “open” in the wild from a bigcorp. Especially in the area that just emerged and is clearly nuanced in the “source” part. I mean, yes, it’s not fully correct, but also everyone to whom this matters figures it out sorta immediately. What is even washing here? These semantic arguments are the least problem we’ll have with all that power concentration, that is assuming the current tech is worth anything outside of its bubble.

By @Havoc - 4 months
I doubt anyone in llm world is gonna care.

In practice the key criteria seem to be:

1) Can I get the weights

2) Is commercial use permitted

More nuances sure but if those are met then many consider it open in the non Stallman sense

By @ChrisArchitect - 4 months
By @talldayo - 4 months
Once again proving that the OSI is a fringe organization that is almost entirely ignored in practical prosecution of Open Source licensing.