June 22nd, 2024

How to run an LLM on your PC, not in the cloud, in less than 10 minutes

You can easily set up and run large language models (LLMs) on your PC using tools like Ollama, LM Suite, and Llama.cpp. Ollama supports AMD GPUs and AVX2-compatible CPUs, with straightforward installation across different systems. It offers commands for managing models and now supports select AMD Radeon cards.

Read original article

How to run an LLM on your PC, not in the cloud, in less than 10 minutes

In less than 10 minutes, you can run a large language model (LLM) on your PC locally using tools like Ollama, LM Suite, and Llama.cpp. These tools make it easy to set up LLMs on Windows, Linux, and Mac systems. While dedicated accelerators like Nvidia GPUs are optimal for performance, Ollama also supports AMD GPUs and AVX2-compatible CPUs. The installation process for Ollama is straightforward across different operating systems. You can start with models like Mistral 7B by running simple commands in PowerShell or a terminal emulator. Quantization techniques can help reduce memory requirements for running LLMs, allowing them to operate on systems with limited resources. Ollama provides commands for managing, updating, and removing installed models, similar to Docker CLI. Additionally, Ollama now supports select AMD Radeon 6000-and 7000-series cards. Stay tuned for more insights on utilizing LLMs from The Register.

20x Faster Background Removal in the Browser Using ONNX Runtime with WebGPU

Using ONNX Runtime with WebGPU and WebAssembly in browsers achieves 20x speedup for background removal, reducing server load, enhancing scalability, and improving data security. ONNX models run efficiently with WebGPU support, offering near real-time performance. Leveraging modern technology, IMG.LY aims to enhance design tools' accessibility and efficiency.

GitHub – Karpathy/LLM101n: LLM101n: Let's Build a Storyteller

The GitHub repository "LLM101n: Let's build a Storyteller" offers a course on creating a Storyteller AI Large Language Model using Python, C, and CUDA. It caters to beginners, covering language modeling, deployment, programming, data types, deep learning, and neural nets. Additional chapters and appendices are available for further exploration.

Delving into ChatGPT usage in academic writing through excess vocabulary

A study by Dmitry Kobak et al. examines ChatGPT's impact on academic writing, finding increased usage in PubMed abstracts. Concerns arise over accuracy and bias despite advanced text generation capabilities.

Why are module implementation and signatures separated in OCaml? (2018)

Separation of module implementation and signatures in OCaml enables scalable builds, creation of cmi files, and streamlined interface modifications. Emphasizing abstraction and implementation separation enhances modular programming and system reasoning.

Detecting hallucinations in large language models using semantic entropy

Researchers devised a method to detect hallucinations in large language models like ChatGPT and Gemini by measuring semantic entropy. This approach enhances accuracy by filtering unreliable answers, improving model performance significantly.

6 comments

By @nijaar - 11 months

It could be even easier, we implemented a two click install open-source local AI manager (+RAG and other cool stuff) for Windows / Mac / Linux. You can check it out in shinkai com or check out the code in https://github.com/dcspark/shinkai-apps

By @butz - 11 months

Depending on your internet speed, with llamafile you can do it even faster. Go to https://github.com/Mozilla-Ocho/llamafile , find Quickstart section and have fun. Scroll down a bit further for more models.

By @boboche - 11 months

I use LM studio https://lmstudio.ai/ for my lazy setups. The 10 minutes is used to download the actual models.

By @sgt101 - 11 months

Any M-series mac with 16GB+ can do this also.

By @jackdawipper - 11 months

ollama. download a few models. bobs your uncle. simple as.

better still you can the use python to call it with langchain chatollama and build anything you want with a little help from claude and chatgpt or codeqwen if you want to do it all locally.

absolute AI don. impress the ladies with that one, you'll be beating them off with a stick when they see it in action.

just need plenty of VRAM after that.

How to run an LLM on your PC, not in the cloud, in less than 10 minutes

Related

20x Faster Background Removal in the Browser Using ONNX Runtime with WebGPU

GitHub – Karpathy/LLM101n: LLM101n: Let's Build a Storyteller

Delving into ChatGPT usage in academic writing through excess vocabulary

Why are module implementation and signatures separated in OCaml? (2018)

Detecting hallucinations in large language models using semantic entropy

Related

20x Faster Background Removal in the Browser Using ONNX Runtime with WebGPU

GitHub – Karpathy/LLM101n: LLM101n: Let's Build a Storyteller

Delving into ChatGPT usage in academic writing through excess vocabulary

Why are module implementation and signatures separated in OCaml? (2018)

Detecting hallucinations in large language models using semantic entropy