August 1st, 2024

PyTorch – Torchchat: Chat with LLMs Everywhere

The torchchat GitHub repository enables execution of large language models using PyTorch on multiple platforms, supporting models like Llama 3 and Mistral, with features for chatting, text generation, and evaluation.

Read original articleLink Icon
CuriositySkepticismEnthusiasm
PyTorch – Torchchat: Chat with LLMs Everywhere

The GitHub repository torchchat is designed to facilitate the execution of large language models (LLMs) using PyTorch across various platforms, including Python, C/C++, and mobile devices (iOS and Android). It supports popular models such as Llama 3, Llama 2, and Mistral. Key features include multiple execution modes (Eager, Compile, AOT Inductor, ExecuTorch), cross-platform compatibility (Linux, macOS, Android, iOS), and various quantization schemes to optimize model performance. The repository offers a command line interface for tasks such as chatting, text generation, model evaluation, and artifact management.

To install torchchat, users can clone the repository, set up a virtual environment, and install the necessary dependencies. Usage examples include chatting with models, generating text based on prompts, and running the application in a browser. The repository also provides instructions for deploying models on mobile devices. Model performance can be evaluated using a specific command that leverages the lm_eval library.

The project encourages community contributions and provides guidelines for participation. It is released under the BSD 3 license, with additional code covered by MIT and Apache licenses. For further details, users can visit the torchchat GitHub repository.

Related

LightRAG: The PyTorch Library for Large Language Model Applications

LightRAG: The PyTorch Library for Large Language Model Applications

The LightRAG PyTorch library aids in constructing RAG pipelines for LLM applications like chatbots and code generation. Easy installation via `pip install lightrag`. Comprehensive documentation at lightrag.sylph.ai.

MobileLLM: Optimizing Sub-Billion Parameter Language Models for On-Device Use

MobileLLM: Optimizing Sub-Billion Parameter Language Models for On-Device Use

The GitHub repository contains MobileLLM code optimized for sub-billion parameter language models for on-device applications. It includes design considerations, code guidelines, outcomes on common sense reasoning tasks, acknowledgements, and licensing details. Contact repository individuals for support.

Karpathy: Let's reproduce GPT-2 (1.6B): one 8XH100 node 24h $672 in llm.c

Karpathy: Let's reproduce GPT-2 (1.6B): one 8XH100 node 24h $672 in llm.c

The GitHub repository focuses on the "llm.c" project by Andrej Karpathy, aiming to implement Large Language Models in C/CUDA without extensive libraries. It emphasizes pretraining GPT-2 and GPT-3 models.

Exo: Run your own AI cluster at home with everyday devices

Exo: Run your own AI cluster at home with everyday devices

The "exo" project on GitHub guides users in creating a home AI cluster with features like LLaMA support, dynamic model splitting, ChatGPT API, and MLX inference. Installation involves cloning the repository and installing requirements. iOS implementation may lag.

Chat with Meta Llama 3.1 405B

Chat with Meta Llama 3.1 405B

The article highlights Meta Llama 3.1 integration on Replicate, enabling easy cloud-based language model execution. Users can interact with Meta Llama 3.1 via chat, utilize its API, clone projects on GitHub, and optimize settings. Start leveraging Replicate for streamlined language model operations.

AI: What people are saying
The comments on the torchchat GitHub repository reveal a mix of curiosity and skepticism regarding its capabilities and use cases.
  • Users are comparing torchchat to other platforms like Ollama, questioning when to use each.
  • There is a debate about the effectiveness of smaller models for general chat versus narrow tasks.
  • Some commenters express interest in practical applications, such as integrating LLMs into their workflows.
  • Technical inquiries about performance on different hardware, particularly CPUs, are raised.
  • Overall, there is a mix of enthusiasm for the tool's potential and concerns about its limitations.
Link Icon 10 comments
By @fbuilesv - 6 months
I'm not well versed in LLMs, can someone with more experience share how this compares to Ollama (https://ollama.com/)? When would I use this instead?
By @gleenn - 6 months
This looks awesome, the instructions are basically a one-liner to get a Python program to start up a chat program, and it's optimized for a lot of hardware you can run locally like if you have an Nvidia GPU or Apple M processor. Super cool work bringing this functionality to local apps and to just play with a lot of popular models. Great work
By @boringg - 6 months
Can someone explain the use case? Is it so that I can run LLMs more readily in terminal instead of having to use a chat interface?

I'm not saying it isn't impressive being able to swap but I have trouble understanding how this integrates into my workflow and I don't really want to put much effort into exploring given that there are so many things to explore these days.

By @ipunchghosts - 6 months
I have been using ollama and generally not that impressed with these models for doing real work. I can't be the only person who thinks this.
By @daghamm - 6 months
Does pytorch have better acceleration on x64 CPUs nowadays?

Last time I played with LLMs on CPU with pytorch you had to replace some stuff with libraries from Intel otherwise your performance would be really bad.

By @ein0p - 6 months
Selling it as a “chat” is a mistake imo. Chatbots require very large models with a lot of stored knowledge about the world. Small models are useful for narrow tasks, but they are not, and will never be, useful for general domain chat
By @suyash - 6 months
This is cool, how can I go about using this for my own dataset - .pdf, .html files etc?
By @jiratemplates - 6 months
looks great
By @aklgh - 6 months
A new PyTorch feature. Who knew!

How about making libtorch a first class citizen without crashes and memory leaks? What happened to the "one tool, one job" philosophy?

As an interesting thought experiment: Should PyTorch be integrated into systemd or should systemd be integrated into PyTorch? Both seem to absorb everything else like a black hole.