Exo: Run your own AI cluster at home with everyday devices
The "exo" project on GitHub guides users in creating a home AI cluster with features like LLaMA support, dynamic model splitting, ChatGPT API, and MLX inference. Installation involves cloning the repository and installing requirements. iOS implementation may lag.
Read original articleThe GitHub repository for the "exo" project offers details on setting up an AI cluster at home using everyday devices. Key features include support for popular models like LLaMA, dynamic model splitting based on network and device resources, automatic device discovery, and a ChatGPT-compatible API. Installation is recommended from the source by cloning the repository and installing requirements via pip. Documentation includes examples for multi-device usage, a ChatGPT-like web interface on each device, and an API endpoint for model interaction. Supported inference engines are MLX, tinygrad, and llama.cpp, with networking support for GRPC modules. Notably, the iOS implementation is rapidly evolving but may lag behind the Python version. For more information, the GitHub repository provides comprehensive details.
Related
Run the strongest open-source LLM model: Llama3 70B with just a single 4GB GPU
The article discusses the release of open-source Llama3 70B model, highlighting its performance compared to GPT-4 and Claude3 Opus. It emphasizes training enhancements, data quality, and the competition between open and closed-source models.
Gemma 2 on AWS Lambda with Llamafile
Google released Gemma 2 9B, a compact language model rivaling GPT-3.5. Mozilla's llamafile simplifies deploying models like LLaVA 1.5 and Mistral 7B Instruct, enhancing accessibility to powerful AI models across various systems.
MobileLLM: Optimizing Sub-Billion Parameter Language Models for On-Device Use
The GitHub repository contains MobileLLM code optimized for sub-billion parameter language models for on-device applications. It includes design considerations, code guidelines, outcomes on common sense reasoning tasks, acknowledgements, and licensing details. Contact repository individuals for support.
Karpathy: Let's reproduce GPT-2 (1.6B): one 8XH100 node 24h $672 in llm.c
The GitHub repository focuses on the "llm.c" project by Andrej Karpathy, aiming to implement Large Language Models in C/CUDA without extensive libraries. It emphasizes pretraining GPT-2 and GPT-3 models.
Distributed LLama3 Inference
The GitHub repository for `Cake` hosts a Rust implementation of LLama3 distributed inference, aiming to utilize consumer hardware for running large models across various devices. Instructions and details are available for setup and optimizations.
- Concerns about network bottlenecks and performance issues when running models over a home network.
- Questions about the feasibility and practicality of using the project on non-Apple devices and the lack of benchmarks.
- Discussions on the potential benefits of local AI compute for privacy and utilizing unused CPU resources.
- Interest in the project's potential for crowdsourcing and collaborative model training.
- Security and licensing concerns are raised by some users.
This enables you to run larger models
than you would be able to on any single
device.
No further explanation on how this is supposed to work?If some layers of the neural network are on deviceA and some layers are on deviceB, wouldn't that mean that for every token generated, all output data from the last layer on deviceA have to be transferred to deviceB?
When there’s only one device left on the network, will it sing Daisy Bell?
so by definition you need (1) good internet 20mb/s+ and (2) good devices.
This thing will not go any further than cool demo on twitter. Please prove me wrong.
Related
Run the strongest open-source LLM model: Llama3 70B with just a single 4GB GPU
The article discusses the release of open-source Llama3 70B model, highlighting its performance compared to GPT-4 and Claude3 Opus. It emphasizes training enhancements, data quality, and the competition between open and closed-source models.
Gemma 2 on AWS Lambda with Llamafile
Google released Gemma 2 9B, a compact language model rivaling GPT-3.5. Mozilla's llamafile simplifies deploying models like LLaVA 1.5 and Mistral 7B Instruct, enhancing accessibility to powerful AI models across various systems.
MobileLLM: Optimizing Sub-Billion Parameter Language Models for On-Device Use
The GitHub repository contains MobileLLM code optimized for sub-billion parameter language models for on-device applications. It includes design considerations, code guidelines, outcomes on common sense reasoning tasks, acknowledgements, and licensing details. Contact repository individuals for support.
Karpathy: Let's reproduce GPT-2 (1.6B): one 8XH100 node 24h $672 in llm.c
The GitHub repository focuses on the "llm.c" project by Andrej Karpathy, aiming to implement Large Language Models in C/CUDA without extensive libraries. It emphasizes pretraining GPT-2 and GPT-3 models.
Distributed LLama3 Inference
The GitHub repository for `Cake` hosts a Rust implementation of LLama3 distributed inference, aiming to utilize consumer hardware for running large models across various devices. Instructions and details are available for setup and optimizations.