Show HN: We made glhf.chat – run almost any open-source LLM, including 405B
The platform allows running various large language models via Hugging Face repo links using vLLM and GPU scheduler. Offers free beta access with plans for competitive pricing post-beta using multi-tenant model running.
Read original articleThe platform offers users the ability to run almost any open-source large language model by pasting a link to the Hugging Face repo. It utilizes vLLM and a custom-built GPU scheduler to support models like Meta Llama 3.1, Qwen 2, Mixtral 8x, Gemma 2, Deepseek V2 Coder Lite, Phi-3, and more. Users can access up to eight Nvidia A100 80Gb GPUs. The service is free during the beta phase to refine operations and pricing strategies. Once out of beta, the platform aims to offer competitive pricing compared to major cloud GPU vendors by leveraging multi-tenant model running capabilities. For popular models, the platform proxies to always-on inference providers, while for custom models, it sets up and shuts down clusters on-demand.
Related
Run the strongest open-source LLM model: Llama3 70B with just a single 4GB GPU
The article discusses the release of open-source Llama3 70B model, highlighting its performance compared to GPT-4 and Claude3 Opus. It emphasizes training enhancements, data quality, and the competition between open and closed-source models.
How to run an LLM on your PC, not in the cloud, in less than 10 minutes
You can easily set up and run large language models (LLMs) on your PC using tools like Ollama, LM Suite, and Llama.cpp. Ollama supports AMD GPUs and AVX2-compatible CPUs, with straightforward installation across different systems. It offers commands for managing models and now supports select AMD Radeon cards.
Gemma 2 on AWS Lambda with Llamafile
Google released Gemma 2 9B, a compact language model rivaling GPT-3.5. Mozilla's llamafile simplifies deploying models like LLaVA 1.5 and Mistral 7B Instruct, enhancing accessibility to powerful AI models across various systems.
Groq Supercharges Fast AI Inference for Meta Llama 3.1
Groq launches Llama 3.1 models with LPU™ AI technology on GroqCloud Dev Console and GroqChat. Mark Zuckerberg praises ultra-low-latency inference for cloud deployments, emphasizing open-source collaboration and AI innovation.
Meta releases an open-weights GPT-4-level AI model, Llama 3.1 405B
Meta has launched Llama 3.1 405B, a free AI language model with 405 billion parameters, challenging closed AI models. Users can download it for personal use, promoting open-source AI principles. Mark Zuckerberg endorses this move.
Your service looks interesting, but I think you need to be more transparent about your infrastructure. To which "inference providers" do you proxy to, and when? Who is hosting the GPU clusters?
Also, a privacy policy and ToS document are pretty important, even at this stage.
Unfortunately, I got an error signing up: "You have reached your limit of 500 users. If you need more users, please use a Production instance."
Probably need to know more about you. Main concerns are "can this be used for production" and "will you be around in a year" and "will you likely have 429s if you get too popular?" and of course likely pricing.
Application error: a server-side exception has occurred (see the server logs for more information).
Digest: 1138364229
It's in the title which is not immediately visible on mobile.
I only get 503 Error response now when I try to load the page.
2) ...
3) Profit
what version llama are you
I am an instance of Llama 2, an artificial intelligence model developed by Meta AI.
how many parameters
I am a Llama 2 model with 7 billion parameters.
Attempt 2
which llama are you exactly
I am a variant of the LLaMA model, specifically LLaMA 1.5B.
Here's a brief overview of the different LLaMA models:
LLaMA 7B: This is the largest and most powerful LLaMA model, with 7 billion parameters. It's the most advanced and capable model in the LLaMA family.
LLaMA 3B: This model has 3 billion parameters and is a smaller but still very capable version of the LLaMA model.
LLaMA 1.5B: This is the model I'm based on, with 1.5 billion parameters. It's a smaller and more efficient version of the LLaMA model, but still very knowledgeable and capable.
LLaMA 650M: This is the smallest LLaMA model, with 650 million parameters. It's a more lightweight and efficient version of the model, but still useful for many tasks.
is 3.1 this bad at hallucinations? or are you just serving something else
This just keeps on giving
when does your training data end
" My training data is a massive corpus of text that was available to me up to a certain point in time. My training data is cut off at December 2021."
Related
Run the strongest open-source LLM model: Llama3 70B with just a single 4GB GPU
The article discusses the release of open-source Llama3 70B model, highlighting its performance compared to GPT-4 and Claude3 Opus. It emphasizes training enhancements, data quality, and the competition between open and closed-source models.
How to run an LLM on your PC, not in the cloud, in less than 10 minutes
You can easily set up and run large language models (LLMs) on your PC using tools like Ollama, LM Suite, and Llama.cpp. Ollama supports AMD GPUs and AVX2-compatible CPUs, with straightforward installation across different systems. It offers commands for managing models and now supports select AMD Radeon cards.
Gemma 2 on AWS Lambda with Llamafile
Google released Gemma 2 9B, a compact language model rivaling GPT-3.5. Mozilla's llamafile simplifies deploying models like LLaVA 1.5 and Mistral 7B Instruct, enhancing accessibility to powerful AI models across various systems.
Groq Supercharges Fast AI Inference for Meta Llama 3.1
Groq launches Llama 3.1 models with LPU™ AI technology on GroqCloud Dev Console and GroqChat. Mark Zuckerberg praises ultra-low-latency inference for cloud deployments, emphasizing open-source collaboration and AI innovation.
Meta releases an open-weights GPT-4-level AI model, Llama 3.1 405B
Meta has launched Llama 3.1 405B, a free AI language model with 405 billion parameters, challenging closed AI models. Users can download it for personal use, promoting open-source AI principles. Mark Zuckerberg endorses this move.