June 25th, 2024

Not all 'open source' AI models are open: here's a ranking

Researchers found large language models claiming to be open source restrict access. Debate on AI model openness continues, with concerns over "open-washing" by tech giants. EU's AI Act may exempt open source models. Transparency and reproducibility are crucial for AI innovation.

Read original articleLink Icon
Not all 'open source' AI models are open: here's a ranking

Researchers have found that many large language models powering chatbots claim to be open source but restrict access to code and training data. The definition of open source in AI models is still debated, with advocates emphasizing the importance of full openness for scientific progress and AI accountability. A study by language scientists identified the most and least open models, revealing that some big tech companies engage in "open-washing" by claiming openness while disclosing minimal information. The European Union's upcoming Artificial Intelligence Act will have implications for models classified as open source, offering them exemptions from certain transparency requirements. Smaller players are noted for being more transparent compared to tech giants. The study also highlights concerns about the lack of transparency regarding the training data used in these models. The importance of openness for reproducibility and innovation in AI research is emphasized, with the need for models to be scrutinized and understood to assess their achievements accurately.

Related

OpenAI and Anthropic are ignoring robots.txt

OpenAI and Anthropic are ignoring robots.txt

Two AI startups, OpenAI and Anthropic, are reported to be disregarding robots.txt rules, allowing them to scrape web content despite claiming to respect such regulations. TollBit analytics revealed this behavior, raising concerns about data misuse.

Lessons About the Human Mind from Artificial Intelligence

Lessons About the Human Mind from Artificial Intelligence

In 2022, a Google engineer claimed AI chatbot LaMDA was self-aware, but further scrutiny revealed it mimicked human-like responses without true understanding. This incident underscores AI limitations in comprehension and originality.

Francois Chollet – LLMs won't lead to AGI – $1M Prize to find solution [video]

Francois Chollet – LLMs won't lead to AGI – $1M Prize to find solution [video]

The video discusses limitations of large language models in AI, emphasizing genuine understanding and problem-solving skills. A prize incentivizes AI systems showcasing these abilities. Adaptability and knowledge acquisition are highlighted as crucial for true intelligence.

Apple Wasn't Interested in AI Partnership with Meta Due to Privacy Concerns

Apple Wasn't Interested in AI Partnership with Meta Due to Privacy Concerns

Apple declined an AI partnership with Meta due to privacy concerns, opting for OpenAI's ChatGPT integration into iOS. Apple emphasizes user choice and privacy in AI partnerships, exploring collaborations with Google and Anthropic for diverse AI models.

Hackers 'jailbreak' powerful AI models in global effort to highlight flaws

Hackers 'jailbreak' powerful AI models in global effort to highlight flaws

Hackers exploit vulnerabilities in AI models from OpenAI, Google, and xAI, sharing harmful content. Ethical hackers challenge AI security, prompting the rise of LLM security start-ups amid global regulatory concerns. Collaboration is key to addressing evolving AI threats.

Link Icon 6 comments
By @tazu - 5 months
For some snark, I'd include "OpenAI" in the table with X on all the metrics...
By @mirzap - 5 months
Which models claim that they are "open source" but are not? I think none of them. I think people confuse "open weight" with "open source." That confusion comes from the "internet AI experts," not the labs that produced them.
By @AndrewKemendo - 5 months
Standard open source rules apply. Unless you can:

Modify the source code

Compile it yourself*

And be able to do it on your own compute*

Then it’s not open

* Which would include the data and the “compiler” which is generally the non-FOSS cUDNN or CUDA drivers btw

*it’s almost like we had a mini-computer, microprocessor revolution 40 years ago for precisely the same reason

By @monkeydust - 5 months
I think France (Macron) help push through the 'open' exemption, whatever that practically turns out to be, for the final EU AI Act...obviously not lost on people that is where Mistral lives.
By @vhiremath4 - 5 months
When they say “LLM data”, does that usually include the tokenizer as well? Beginner question from someone at the end of Karpathy’s Zero to Hero course.