Llama 3.1 Official Launch
Llama introduces Llama 3.1, an open-source AI model available in 8B, 70B, and 405B versions. The 405B model is highlighted for its versatility in supporting various use cases, including multi-lingual agents and analyzing large documents. Users can leverage coding assistants, real-time or batch inference, and fine-tuning capabilities. Llama emphasizes open-source AI and offers subscribers updates via a newsletter.
Read original articleLlama has introduced Llama 3.1, an open-source AI model available in 8B, 70B, and 405B versions. The 405B model is highlighted as the flagship foundation model supporting a wide range of use cases. Users can leverage Llama's capabilities to build advanced use cases, such as multi-lingual agents, complex reasoning, and analyzing large documents with up to 128k tokens. The platform offers coding assistants for tasks like maze generation and provides services for real-time or batch inference. Users can fine-tune, distill, and deploy models for their applications, utilizing synthetic data generation and partner starter guides. Llama's performance is benchmarked across various categories, showcasing its effectiveness in different tasks. The platform emphasizes open-source AI as the way forward and encourages users to explore the latest updates and models. Subscribers can stay informed about Llama's developments by signing up for the newsletter.
Related
Run the strongest open-source LLM model: Llama3 70B with just a single 4GB GPU
The article discusses the release of open-source Llama3 70B model, highlighting its performance compared to GPT-4 and Claude3 Opus. It emphasizes training enhancements, data quality, and the competition between open and closed-source models.
LLMs on the Command Line
Simon Willison presented a Python command-line utility for accessing Large Language Models (LLMs) efficiently, supporting OpenAI models and plugins for various providers. The tool enables running prompts, managing conversations, accessing specific models like Claude 3, and logging interactions to a SQLite database. Willison highlighted using LLM for tasks like summarizing discussions and emphasized the importance of embeddings for semantic search, showcasing LLM's support for content similarity queries and extensibility through plugins and OpenAI API compatibility.
Show HN: Perplexity (llama3 70B) Inline Bot on Telegram
The Llama 3 AI bot on Telegram provides internet access for knowledge sharing. Users can ask questions, summarize content, get programming help, and choose between monthly or yearly plans for a fee. Refunds within 30 days are possible.
Benchmarking LLM Inference Back Ends: VLLM, LMDeploy, MLC-LLM, TensorRT-LLM, TGI
Selecting the right inference backend for large language models is crucial for user experience and cost efficiency. A benchmark study by BentoML compared various backends, highlighting LMDeploy's decoding performance, vLLM's low TTFT, and considerations beyond performance. BentoML and BentoCloud are recommended tools for efficient AI model deployment.
Gemma 2 on AWS Lambda with Llamafile
Google released Gemma 2 9B, a compact language model rivaling GPT-3.5. Mozilla's llamafile simplifies deploying models like LLaVA 1.5 and Mistral 7B Instruct, enhancing accessibility to powerful AI models across various systems.
Open source AI is the path forward - https://news.ycombinator.com/item?id=41046773 - July 2024 (278 comments)
Quick comparison with GPT-4o:
+----------------+-------+-------+
| Metric | GPT-4o| Llama |
| | | 3.1 |
| | | 405B |
+----------------+-------+-------+
| MMLU | 88.7 | 88.6 |
| GPQA | 53.6 | 51.1 |
| MATH | 76.6 | 73.8 |
| HumanEval | 90.2 | 89.0 |
| MGSM | 90.5 | 91.6 |
+----------------+-------+-------+
GPT-4o 30.7
GPT-4 turbo (2024-04-09) 29.7
Llama 3.1 405B Instruct 29.5
Claude 3.5 Sonnet 27.9
Claude 3 Opus 27.3
Llama 3.1 70B Instruct 26.4
Gemini Pro 1.5 0514 22.3
Gemma 2 27B Instruct 21.2
Mistral Large 17.7
Gemma 2 9B Instruct 16.3
Qwen 2 Instruct 72B 15.6
Gemini 1.5 Flash 15.3
GPT-4o mini 14.3
Llama 3.1 8B Instruct 14.0
DeepSeek-V2 Chat 236B (0628) 13.4
Nemotron-4 340B 12.7
Mixtral-8x22B Instruct 12.2
Yi Large 12.1
Command R Plus 11.1
Mistral Small 9.3
Reka Core-20240501 9.1
GLM-4 9.0
Qwen 1.5 Chat 32B 8.7
Phi-3 Small 8k 8.4
DBRX 8.0
If you want to learn more, there is a writeup at https://wow.groq.com/now-available-on-groq-the-largest-and-m....
(disclaimer, I am a Groq employee)
Statement from Mark: https://about.fb.com/news/2024/07/open-source-ai-is-the-path...
https://about.fb.com/news/2024/07/open-source-ai-is-the-path...
If you want a playground to test this model locally or want to quickly build some applications with it, you can try LLMStack (https://github.com/trypromptly/LLMStack). I wrote last week about how to configure and use Ollama with LLMStack at https://docs.trypromptly.com/guides/using-llama3-with-ollama.
Disclaimer: I'm the maintainer of LLMStack
Examples: OpenAI's GPT 4o-mini is second only to 4o on LMSys Overall, but is 6.7 points behind 4o on MMLU. It's "punching above its weight" in real-world contexts. The Gemma series (9B and 27B) are similar, both beating the mean in terms of ELO per MMLU point. Microsoft's Phi series are all below the mean, meaning they have strong MMLU scores but aren't preferred in real-world contexts.
Llama 3 8B previously did substantially better than the mean on LMSys Overall, so hopefully Llama 3.1 8B will be even better! The 70B variant was interestingly right on the mean. Hopefully the 430B variant won't fall below!
Open source models are very exciting for self hosting, but the per-token hosted inference pricing hasn't been competitive with OpenAI and Anthropic, at least for a given tier of quality. (E.g.: Llama 3 70B costing between $1 and $10 per million tokens on various platforms, but Claude Sonnet 3.5 is $3 per million.)
[1]: https://github.com/meta-llama/llama-models/blob/main/models/...
[2]: https://github.com/meta-llama/llama-recipes/blob/main/recipe...
Have other major models explicitly communicated that they're trained on synthetic data?
https://aider.chat/docs/leaderboards/
77.4% claude-3.5-sonnet
75.2% DeepSeek Coder V2 (whole)
72.9% gpt-4o
69.9% DeepSeek Chat V2 0628
68.4% claude-3-opus-20240229
67.7% gpt-4-0613
66.2% llama-3.1-405b-instruct (whole)
Llama 3 Training System
19.2 exaFLOPS
_____
/ \ Cluster 1 Cluster 2
/ \ 9.6 exaFLOPS 9.6 exaFLOPS
/ \ _______ _______
/ ___ \ / \ / \
,----' / \`. `-' 24000 `--' 24000 `----.
( _/ __) GPUs GPUs )
`---'( / ) 400+ TFLOPS 400+ TFLOPS ,'
\ ( / per GPU per GPU ,'
\ \/ ,'
\ \ TOTAL SYSTEM ,'
\ \ 19,200,000 TFLOPS ,'
\ \ 19.2 exaFLOPS ,'
\___\ ,'
`----------------'
On a related note, for those interested in experimenting with large language models locally, I've been working on an app called Msty [1]. It allows you to run models like this with just one click and features a clean, functional interface. Just added support for both 8B and 70B. Still in development, but I'd appreciate any feedback.
[1]: https://msty.app
Let us know if you have other needs!
Open Source AI Is the Path Forward
https://about.fb.com/news/2024/07/open-source-ai-is-the-path...
Seems like the biggest GPU node they have is the p5.48xlarge @ 640GB (8xH100s). Routing between multiple nodes would be too slow unless there's an InfiniBand fabric you can leverage. Interested to know if anyone else is exploring this.
And answer queries like:
Give all <myObject> which refer to <location> which refer to an Indo-European <language>.
https://github.com/meta-llama/llama-models/blob/main/models/...
Would love to hear your feedback!
Meta's goal from the start was to target OpenAI and the other proprietary model players with a "scorched earth" approach by releasing powerful open models to disrupt the competitive landscape.
Meta can likely outspend any other AI lab on compute and talent:
- OpenAI makes an estimated revenue of $2B and is likely unprofitable. Meta generated a revenue of $134B and profits of $39B in 2023.
- Meta's compute resources likely outrank OpenAI by now.
- Open source likely attracts better talent and researchers.
- One possible outcome could be the acquisition of OpenAI by Microsoft to catch up with Meta.
The big winners of this: devs and AI product startups
Related
Run the strongest open-source LLM model: Llama3 70B with just a single 4GB GPU
The article discusses the release of open-source Llama3 70B model, highlighting its performance compared to GPT-4 and Claude3 Opus. It emphasizes training enhancements, data quality, and the competition between open and closed-source models.
LLMs on the Command Line
Simon Willison presented a Python command-line utility for accessing Large Language Models (LLMs) efficiently, supporting OpenAI models and plugins for various providers. The tool enables running prompts, managing conversations, accessing specific models like Claude 3, and logging interactions to a SQLite database. Willison highlighted using LLM for tasks like summarizing discussions and emphasized the importance of embeddings for semantic search, showcasing LLM's support for content similarity queries and extensibility through plugins and OpenAI API compatibility.
Show HN: Perplexity (llama3 70B) Inline Bot on Telegram
The Llama 3 AI bot on Telegram provides internet access for knowledge sharing. Users can ask questions, summarize content, get programming help, and choose between monthly or yearly plans for a fee. Refunds within 30 days are possible.
Benchmarking LLM Inference Back Ends: VLLM, LMDeploy, MLC-LLM, TensorRT-LLM, TGI
Selecting the right inference backend for large language models is crucial for user experience and cost efficiency. A benchmark study by BentoML compared various backends, highlighting LMDeploy's decoding performance, vLLM's low TTFT, and considerations beyond performance. BentoML and BentoCloud are recommended tools for efficient AI model deployment.
Gemma 2 on AWS Lambda with Llamafile
Google released Gemma 2 9B, a compact language model rivaling GPT-3.5. Mozilla's llamafile simplifies deploying models like LLaVA 1.5 and Mistral 7B Instruct, enhancing accessibility to powerful AI models across various systems.