October 28th, 2024

Open-source AI must reveal its training data, per new OSI definition

The Open Source Initiative has defined "open" AI, requiring disclosure of training data and code, challenging companies like Meta, whose Llama model does not comply, amid ongoing debates on open-source values.

Read original article

Open-source AI must reveal its training data, per new OSI definition

The Open Source Initiative (OSI) has established a new definition for "open" artificial intelligence, which mandates that AI systems disclose their training data, code, and settings. This definition poses a challenge to companies like Meta, whose Llama model does not meet these criteria due to restrictions on commercial use and lack of access to training data. Meta has expressed disagreement with OSI's definition, arguing that it oversimplifies the complexities of modern AI. The OSI's definition aims to combat "open washing," where companies falsely claim their products are open source. The initiative took two years to develop, involving consultations with experts in various fields. Critics suggest that Meta's reluctance to share training data stems from a desire to protect its competitive edge and minimize legal risks associated with copyright issues. OSI's executive director noted parallels between Meta's current stance and Microsoft's historical resistance to open-source principles. The ongoing debate highlights the tension between traditional open-source values and the evolving landscape of AI technology.

- OSI's new definition requires AI systems to disclose training data, code, and settings.

- Meta's Llama model does not comply with OSI's open-source standards.

- The definition aims to prevent companies from misrepresenting their products as open source.

- Critics argue that Meta's data restrictions are motivated by competitive and legal concerns.

- The discussion reflects broader tensions between open-source principles and AI development.

Begun, the open source AI wars have.. This is going to be ugly. Really ugly.

The Open Source Initiative is finalizing a definition for open source AI, facing criticism for potentially allowing proprietary systems to claim open source status, with ongoing debates expected for years.

The OSI lacks competence to define Open Source AI

The Open Source Initiative faces criticism for its handling of the Open Source AI Definition, with concerns over expertise, censorship, and transparency, as it approaches a deadline of October 28, 2024.

The open secret of open washing – why companies pretend to be open source

Open washing involves companies falsely claiming their products are open source, particularly in AI. This threatens collaboration, increases legal issues, and hinders innovation, highlighting the need for clear definitions.

Meta under fire for 'polluting' open-source

Meta's labeling of its Llama AI models as "open-source" has drawn criticism for being misleading, as they do not fulfill full open-source criteria, prompting calls for greater transparency in AI development.

Open-Access AI: Lessons from Open-Source Software

Open-access AI models, like Meta's Llama, impose usage restrictions, misleadingly labeled as "open-source." Access to training data is essential for innovation, raising concerns about monopolistic control in AI advancements.

4 comments

By @wruza - 4 months

OSI is not an “open-source” trademark holder though. It’s basically an opinion, cause current OSI-approved licenses do not include this exlicitly.

I guess I’m built different, but all this open-washing noise makes no sense to me. I don’t think “oh, open-free as in both” every time I see “open” in the wild from a bigcorp. Especially in the area that just emerged and is clearly nuanced in the “source” part. I mean, yes, it’s not fully correct, but also everyone to whom this matters figures it out sorta immediately. What is even washing here? These semantic arguments are the least problem we’ll have with all that power concentration, that is assuming the current tech is worth anything outside of its bubble.

By @Havoc - 4 months

I doubt anyone in llm world is gonna care.

In practice the key criteria seem to be:

1) Can I get the weights

2) Is commercial use permitted

More nuances sure but if those are met then many consider it open in the non Stallman sense

By @ChrisArchitect - 4 months

[dupe] https://news.ycombinator.com/item?id=41951421

By @talldayo - 4 months

Once again proving that the OSI is a fringe organization that is almost entirely ignored in practical prosecution of Open Source licensing.

Open-source AI must reveal its training data, per new OSI definition

Related

Begun, the open source AI wars have.. This is going to be ugly. Really ugly.

The OSI lacks competence to define Open Source AI

The open secret of open washing – why companies pretend to be open source

Meta under fire for 'polluting' open-source

Open-Access AI: Lessons from Open-Source Software

Related

Begun, the open source AI wars have.. This is going to be ugly. Really ugly.

The OSI lacks competence to define Open Source AI

The open secret of open washing – why companies pretend to be open source

Meta under fire for 'polluting' open-source

Open-Access AI: Lessons from Open-Source Software