The Open Source AI Definition RC1 Is Available for Comments
The Open Source Initiative released RC1 of the Open Source AI Definition for community feedback, emphasizing training data sharing, complete code transparency, and targeting a final version release on October 28, 2024.
Read original articleThe Open Source Initiative has released the Release Candidate 1 (RC1) of the Open Source AI Definition, inviting community feedback. This version incorporates insights from five town hall meetings and discussions across various countries. Key updates include a requirement for sharing all training data, a clarification that code must be complete for downstream users to understand the training process, and the allowance for copyleft-like terms for code, data, and parameters. The definition emphasizes that while Open Source does not guarantee reproducibility, it should not hinder it. The focus of the drafting process will now shift to bug fixes and enhancing accompanying documentation, with a target release date of October 28 for version 1.0. The initiative aims to gather more endorsements and address new concerns raised by the community.
- Open Source AI Definition RC1 is open for community comments.
- Key changes include mandatory sharing of training data and complete code for transparency.
- Copyleft-like terms for code and data are now permissible.
- The initiative does not aim for reproducibility but ensures it is not obstructed.
- The final version is expected to be released on October 28, 2024.
Related
Not all 'open source' AI models are open: here's a ranking
Researchers found large language models claiming to be open source restrict access. Debate on AI model openness continues, with concerns over "open-washing" by tech giants. EU's AI Act may exempt open source models. Transparency and reproducibility are crucial for AI innovation.
Open Source AI Is the Path Forward
Mark Zuckerberg discusses the significance of open source AI, introducing Llama 3.1 405B as a frontier-level model. He collaborates with Amazon and Nvidia to support developers, emphasizing customization, transparency, and safety in AI development for a democratized and secure technological future.
The first release candidate of FreeCAD 1.0 is out
FreeCAD 1.0 release candidate is available for download, targeting users who prefer stability. Seven blockages remain, and user testing is encouraged to identify bugs and contribute to development.
Begun, the open source AI wars have.. This is going to be ugly. Really ugly.
The Open Source Initiative is finalizing a definition for open source AI, facing criticism for potentially allowing proprietary systems to claim open source status, with ongoing debates expected for years.
Policymakers Should Let Open Source Play a Role in the AI Revolution
The R Street Institute highlights the significance of open-source AI for innovation, noting a rise in investment from $900 million in 2022 to $2.9 billion in 2023, urging balanced regulations.
please dont let me discourage tho, i think this could be important work but if and only if this gets endorsement from >1 large model lab producing any interesting work
The only actually open source model I am aware of is AI2’s OLMo (https://blog.allenai.org/olmo-open-language-model-87ccfc95f5...), which includes training data, training code, evaluation code, fine tuning code, etc.
The license also matters. A burdened license that restricts what you can do with the software is not really open source.
I do have concerns about where OSI is going with all this. For example, why are they now saying that reproducibility is not a part of the definition? These two paragraphs below contradict each other - what does it mean to be able to “meaningfully fork” something and be able to make it more useful if you don’t have the ingredients to reproduce it in the first place?
> The aim of Open Source is not and has never been to enable reproducible software. The same is true for Open Source AI: reproducibility of AI science is not the objective. Open Source’s role is merely not to be an impediment to reproducibility. In other words, one can always add more requirements on top of Open Source, just like the Reproducible Builds effort does.
> Open Source means giving anyone the ability to meaningfully “fork” (study and modify) a system, without requiring additional permissions, to make it more useful for themselves and also for everyone.
> The aim of Open Source is not and has never been to enable reproducible software.
...
> Open Source means giving anyone the ability to meaningfully “fork” (study and modify) a system, without requiring additional permissions, to make it more useful for themselves and also for everyone.
...
> Forking in the machine learning context has the same meaning as with software: having the ability and the rights to build a system that behaves differently than its original status. Things that a fork may achieve are: fixing security issues, improving behavior, removing bias.
For these things, it does mean what most people are asking for: training details.So far companies are just releasing checkpoints and architecture. It is better than nothing and this is a great step (especially with how entrenched businesses are[1]). But if we really want to do things like fixing security issues or remove bias, you have to be able to understand the data that it was originally trained on AND the training procedures. Both of these introduce certain biases (via statistical definition, which is more general). These issues can't all be solved by tuning and the ability to tune is significantly influenced by these decisions.
The reason we care about reproducible builds is because it matters to things like security, where we know what we're looking at is the same thing that's in the actual program. It is fair to say that the "aim" isn't about reproducible software, but it is a direct consequence of the software being open source. Trust matters, but the saying is "trust but verify". Sure, you can also fix vulns and bugs in closed source software, hell, you can even edit or build on top of it. But we don't call these things open source (or source available) for a reason.
If we're going to be consistent in our definitions, we need to understand what these things are at at least a minimal level of abstraction. And frankly, as a ML researcher, I just don't see it.
That said, I'm generally fine with "source available" and like most people use it synonymous with "open source". But if you're going to go around telling everyone they're wrong about the OSS definition, at least be consistent and stick to your values.
[0] https://opensource.org/osd
[1] Businesses who's entire model depends on OSS (by OS's definition) and freely available research
Just like a free grazing field would allow living animals, but not a combine harvester. The old rules of "for any purpose" no longer apply.
Okay, well just because you have the domain name "opensource.org" doesn't give you the ability to speak for the community, and the community's understanding of the term.
opensource.org is irrelevant.
Related
Not all 'open source' AI models are open: here's a ranking
Researchers found large language models claiming to be open source restrict access. Debate on AI model openness continues, with concerns over "open-washing" by tech giants. EU's AI Act may exempt open source models. Transparency and reproducibility are crucial for AI innovation.
Open Source AI Is the Path Forward
Mark Zuckerberg discusses the significance of open source AI, introducing Llama 3.1 405B as a frontier-level model. He collaborates with Amazon and Nvidia to support developers, emphasizing customization, transparency, and safety in AI development for a democratized and secure technological future.
The first release candidate of FreeCAD 1.0 is out
FreeCAD 1.0 release candidate is available for download, targeting users who prefer stability. Seven blockages remain, and user testing is encouraged to identify bugs and contribute to development.
Begun, the open source AI wars have.. This is going to be ugly. Really ugly.
The Open Source Initiative is finalizing a definition for open source AI, facing criticism for potentially allowing proprietary systems to claim open source status, with ongoing debates expected for years.
Policymakers Should Let Open Source Play a Role in the AI Revolution
The R Street Institute highlights the significance of open-source AI for innovation, noting a rise in investment from $900 million in 2022 to $2.9 billion in 2023, urging balanced regulations.