July 24th, 2024

Show HN: We made glhf.chat – run almost any open-source LLM, including 405B

The platform allows running various large language models via Hugging Face repo links using vLLM and GPU scheduler. Offers free beta access with plans for competitive pricing post-beta using multi-tenant model running.

Read original articleLink Icon
Show HN: We made glhf.chat – run almost any open-source LLM, including 405B

The platform offers users the ability to run almost any open-source large language model by pasting a link to the Hugging Face repo. It utilizes vLLM and a custom-built GPU scheduler to support models like Meta Llama 3.1, Qwen 2, Mixtral 8x, Gemma 2, Deepseek V2 Coder Lite, Phi-3, and more. Users can access up to eight Nvidia A100 80Gb GPUs. The service is free during the beta phase to refine operations and pricing strategies. Once out of beta, the platform aims to offer competitive pricing compared to major cloud GPU vendors by leveraging multi-tenant model running capabilities. For popular models, the platform proxies to always-on inference providers, while for custom models, it sets up and shuts down clusters on-demand.

Related

Run the strongest open-source LLM model: Llama3 70B with just a single 4GB GPU

Run the strongest open-source LLM model: Llama3 70B with just a single 4GB GPU

The article discusses the release of open-source Llama3 70B model, highlighting its performance compared to GPT-4 and Claude3 Opus. It emphasizes training enhancements, data quality, and the competition between open and closed-source models.

How to run an LLM on your PC, not in the cloud, in less than 10 minutes

How to run an LLM on your PC, not in the cloud, in less than 10 minutes

You can easily set up and run large language models (LLMs) on your PC using tools like Ollama, LM Suite, and Llama.cpp. Ollama supports AMD GPUs and AVX2-compatible CPUs, with straightforward installation across different systems. It offers commands for managing models and now supports select AMD Radeon cards.

Gemma 2 on AWS Lambda with Llamafile

Gemma 2 on AWS Lambda with Llamafile

Google released Gemma 2 9B, a compact language model rivaling GPT-3.5. Mozilla's llamafile simplifies deploying models like LLaVA 1.5 and Mistral 7B Instruct, enhancing accessibility to powerful AI models across various systems.

Groq Supercharges Fast AI Inference for Meta Llama 3.1

Groq Supercharges Fast AI Inference for Meta Llama 3.1

Groq launches Llama 3.1 models with LPU™ AI technology on GroqCloud Dev Console and GroqChat. Mark Zuckerberg praises ultra-low-latency inference for cloud deployments, emphasizing open-source collaboration and AI innovation.

Meta releases an open-weights GPT-4-level AI model, Llama 3.1 405B

Meta releases an open-weights GPT-4-level AI model, Llama 3.1 405B

Meta has launched Llama 3.1 405B, a free AI language model with 405 billion parameters, challenging closed AI models. Users can download it for personal use, promoting open-source AI principles. Mark Zuckerberg endorses this move.

Link Icon 25 comments
By @vlugorilla - 5 months
Please, accept cryptocurrency payments! I've found very few LLM providers that can be paid with crypto, and most of them are of bad quality.
By @imiric - 5 months
Congrats on the launch!

Your service looks interesting, but I think you need to be more transparent about your infrastructure. To which "inference providers" do you proxy to, and when? Who is hosting the GPU clusters?

Also, a privacy policy and ToS document are pretty important, even at this stage.

By @tspng - 5 months
Congratulations on the launch!

Unfortunately, I got an error signing up: "You have reached your limit of 500 users. If you need more users, please use a Production instance."

By @langcss - 5 months
A quick play and the UX is great. If you are allowing anyone to use 405B by API then you beat Groq to that :) they have limited the access.

Probably need to know more about you. Main concerns are "can this be used for production" and "will you be around in a year" and "will you likely have 429s if you get too popular?" and of course likely pricing.

By @nunobrito - 5 months
Registration limit: "You have reached your limit of 500 users. If you need more users, please use a Production instance."
By @jdthedisciple - 5 months
Probably a really dumb question and I missed it somehow, but who is paying for this if it's free?
By @Tepix - 5 months
Looks promising! However, no privacy policy is a total show stopper.
By @tcsenpai - 5 months
That's really amazing. How can you afford to run it for free? Tbh I'd pay more willingly this than perplexity pro
By @sweca - 5 months
If you guys can beat Together AI pricing on LlaMa 3.1 405B, you will be my favourite AI provider
By @SuperHeavy256 - 5 months
I tried loading Mistal's Mamba Codestral. Didn't work. Got the error: "Application error: a client-side exception has occurred (see the browser console for more information)."
By @renewiltord - 5 months
It worked very well. Llama-3.1-405B is really impressive! But then the whole thing stopped working, as more people started using I suppose. Can't wait for the paid offering.
By @codetrotter - 5 months
I was chatting with the 405B model and got this after a few messages:

Application error: a server-side exception has occurred (see the server logs for more information).

Digest: 1138364229

By @android521 - 5 months
Do you have api access ? Most people needs to build their own apps with api and will choose you if you have cheaper hosting/api price
By @GTP - 5 months
I just tried opening the link, and got a 500 error. Too many connections from hn?
By @akadeb - 5 months
Could you clarify what running models multi-tenant means?
By @jcuenod - 5 months
Seems like setting max_tokens crashes your endpoint.
By @tcsenpai - 5 months
Giving 500 error here, is it me?
By @lxe - 5 months
Why AWQ instead of exl2?
By @tardismechanic - 5 months
In case you were wondering, it supposedly stands for Good Luck Have Fun.

It's in the title which is not immediately visible on mobile.

By @Fantastic9881 - 5 months
Fantastic. Thanks.
By @codetrotter - 5 months
Aaaand it's gone :(

I only get 503 Error response now when I try to load the page.

By @kcb - 5 months
1) Give access to expensive hardware for free

2) ...

3) Profit

By @ec109685 - 5 months
Pretty bait and switchy to let the person sign up and then demand an invite code.
By @shmatt - 5 months
attempt 1 with llama 3.1 405b

what version llama are you

I am an instance of Llama 2, an artificial intelligence model developed by Meta AI.

how many parameters

I am a Llama 2 model with 7 billion parameters.

Attempt 2

which llama are you exactly

I am a variant of the LLaMA model, specifically LLaMA 1.5B.

Here's a brief overview of the different LLaMA models:

LLaMA 7B: This is the largest and most powerful LLaMA model, with 7 billion parameters. It's the most advanced and capable model in the LLaMA family.

LLaMA 3B: This model has 3 billion parameters and is a smaller but still very capable version of the LLaMA model.

LLaMA 1.5B: This is the model I'm based on, with 1.5 billion parameters. It's a smaller and more efficient version of the LLaMA model, but still very knowledgeable and capable.

LLaMA 650M: This is the smallest LLaMA model, with 650 million parameters. It's a more lightweight and efficient version of the model, but still useful for many tasks.

is 3.1 this bad at hallucinations? or are you just serving something else

This just keeps on giving

when does your training data end

" My training data is a massive corpus of text that was available to me up to a certain point in time. My training data is cut off at December 2021."