TensorDict: A GPU-accelerated Python dictionary
TensorDict is a PyTorch library for managing tensor data efficiently, enhancing code readability and modularity. It supports metadata, nested structures, and various tensor operations, with installation via pip or conda.
Read original articleTensorDict is a library developed by PyTorch that functions as a dictionary-like class, enabling efficient management of tensor data in a structured manner. It aims to enhance code readability, compactness, and modularity by abstracting operations on batches of tensors, making it suitable for various machine learning applications. Key features include the ability to carry metadata for easy querying, support for nested structures, and the capability to store non-tensor data. TensorDict supports common tensor-like operations such as arithmetic, indexing, reshaping, stacking, and concatenating. It also offers distributed capabilities for both synchronous and asynchronous data transfer, compatibility with FuncTorch for functional programming, and faster serialization methods for model parameters compared to standard PyTorch. Additionally, it facilitates efficient preprocessing of large datasets and lazy allocation of tensors. TensorDict can be installed via pip or conda, and it is licensed under the MIT License. Users are encouraged to cite TensorDict in their work using the provided BibTeX entry. For further details, the TensorDict documentation is available online.
Related
Show HN: txtai: open-source, production-focused vector search and RAG
The txtai tool is a versatile embeddings database for semantic search, LLM orchestration, and language model workflows. It supports vector search with SQL, RAG, topic modeling, and more. Users can create embeddings for various data types and utilize language models for diverse tasks. Txtai is open-source and supports multiple programming languages.
Dynolog: Open-Source System Observability
Dynolog is an open-source observability tool for optimizing AI applications on distributed CPU-GPU systems. It offers continuous monitoring of performance metrics, integrates with PyTorch Profiler and Kineto CUDA profiling library, and supports GPU monitoring for NVIDIA GPUs and CPU events for Intel and AMD CPUs. Developed in Rust, Dynolog focuses on Linux platforms to enhance AI model observability in cloud environments.
PyTorch Lightning: A Comprehensive Hands-On Tutorial
This tutorial explores using PyTorch Lightning to streamline deep learning model development. It simplifies training loops, supports multi-GPU training, and enhances experiment tracking. The tutorial covers setup, dataset handling, and workflow comparison.
txtai 7.3 released: Adds new RAG Web Apps and streaming LLM/RAG support
The txtai 7.3.0 release introduces an open-source embeddings database for semantic search and language model workflows. It supports various data types, pipelines for tasks like summarization, and can be built with Python or YAML.
txtai: Open-source vector search and RAG for minimalists
txtai is a versatile tool for semantic search, LLM orchestration, and language model workflows. It offers features like vector search with SQL, topic modeling, and multimodal indexing, supporting various tasks with language models. Built with Python and open-source under Apache 2.0 license.
With TensorDict, you can apply the same operation to a collection of tensors with ease, eliminating the need for tedious loops and improving your code's readability. Our API is designed to be intuitive, mirroring the torch.Tensor API, so you can reshape, split, concatenate, clone, and more with familiar syntax.
But that's not all - TensorDict also optimizes operations under the hood, resulting in significant speedups. For example, casting a large collection of tensors to GPU can be up to 2x faster than using a regular Python loop. We used fused CUDA kernels for arithmetic ops so you can code an optimizer like ADAM in 5 lines of code (like you would do for a single tensor) and it will run much faster in eager and compile modes than you would have done with a regular for loop over your tensors.
Plus, our library comes with a GPU-accelerated dataclass (@tensorclass) for even more performance gains.
Other key features include: - torch. Compile (PT2) compatibility - Consolidation into a single storage for fast node-to-node communication - Support for memory-mapping and shared memory - TensorDict can be used as lightweight substitute to a dataloader thanks to `TensorDict.map_iter`, and do preproc on-device, outperforming regular dataloading speed by orders of magnitude (check the tutorials!)
We're excited to share TensorDict with the community and invite your feedback and contributions!
Try it out and let us know what you think!
Related
Show HN: txtai: open-source, production-focused vector search and RAG
The txtai tool is a versatile embeddings database for semantic search, LLM orchestration, and language model workflows. It supports vector search with SQL, RAG, topic modeling, and more. Users can create embeddings for various data types and utilize language models for diverse tasks. Txtai is open-source and supports multiple programming languages.
Dynolog: Open-Source System Observability
Dynolog is an open-source observability tool for optimizing AI applications on distributed CPU-GPU systems. It offers continuous monitoring of performance metrics, integrates with PyTorch Profiler and Kineto CUDA profiling library, and supports GPU monitoring for NVIDIA GPUs and CPU events for Intel and AMD CPUs. Developed in Rust, Dynolog focuses on Linux platforms to enhance AI model observability in cloud environments.
PyTorch Lightning: A Comprehensive Hands-On Tutorial
This tutorial explores using PyTorch Lightning to streamline deep learning model development. It simplifies training loops, supports multi-GPU training, and enhances experiment tracking. The tutorial covers setup, dataset handling, and workflow comparison.
txtai 7.3 released: Adds new RAG Web Apps and streaming LLM/RAG support
The txtai 7.3.0 release introduces an open-source embeddings database for semantic search and language model workflows. It supports various data types, pipelines for tasks like summarization, and can be built with Python or YAML.
txtai: Open-source vector search and RAG for minimalists
txtai is a versatile tool for semantic search, LLM orchestration, and language model workflows. It offers features like vector search with SQL, topic modeling, and multimodal indexing, supporting various tasks with language models. Built with Python and open-source under Apache 2.0 license.