August 2nd, 2024

TensorDict: A GPU-accelerated Python dictionary

TensorDict is a PyTorch library for managing tensor data efficiently, enhancing code readability and modularity. It supports metadata, nested structures, and various tensor operations, with installation via pip or conda.

Read original article

TensorDict: A GPU-accelerated Python dictionary

TensorDict is a library developed by PyTorch that functions as a dictionary-like class, enabling efficient management of tensor data in a structured manner. It aims to enhance code readability, compactness, and modularity by abstracting operations on batches of tensors, making it suitable for various machine learning applications. Key features include the ability to carry metadata for easy querying, support for nested structures, and the capability to store non-tensor data. TensorDict supports common tensor-like operations such as arithmetic, indexing, reshaping, stacking, and concatenating. It also offers distributed capabilities for both synchronous and asynchronous data transfer, compatibility with FuncTorch for functional programming, and faster serialization methods for model parameters compared to standard PyTorch. Additionally, it facilitates efficient preprocessing of large datasets and lazy allocation of tensors. TensorDict can be installed via pip or conda, and it is licensed under the MIT License. Users are encouraged to cite TensorDict in their work using the provided BibTeX entry. For further details, the TensorDict documentation is available online.

Show HN: txtai: open-source, production-focused vector search and RAG

The txtai tool is a versatile embeddings database for semantic search, LLM orchestration, and language model workflows. It supports vector search with SQL, RAG, topic modeling, and more. Users can create embeddings for various data types and utilize language models for diverse tasks. Txtai is open-source and supports multiple programming languages.

Dynolog: Open-Source System Observability

Dynolog is an open-source observability tool for optimizing AI applications on distributed CPU-GPU systems. It offers continuous monitoring of performance metrics, integrates with PyTorch Profiler and Kineto CUDA profiling library, and supports GPU monitoring for NVIDIA GPUs and CPU events for Intel and AMD CPUs. Developed in Rust, Dynolog focuses on Linux platforms to enhance AI model observability in cloud environments.

PyTorch Lightning: A Comprehensive Hands-On Tutorial

This tutorial explores using PyTorch Lightning to streamline deep learning model development. It simplifies training loops, supports multi-GPU training, and enhances experiment tracking. The tutorial covers setup, dataset handling, and workflow comparison.

txtai 7.3 released: Adds new RAG Web Apps and streaming LLM/RAG support

The txtai 7.3.0 release introduces an open-source embeddings database for semantic search and language model workflows. It supports various data types, pipelines for tasks like summarization, and can be built with Python or YAML.

txtai: Open-source vector search and RAG for minimalists

txtai is a versatile tool for semantic search, LLM orchestration, and language model workflows. It offers features like vector search with SQL, topic modeling, and multimodal indexing, supporting various tasks with language models. Built with Python and open-source under Apache 2.0 license.

2 comments

By @vmoens - 9 months

If you're tired of dealing with boilerplate code when working with multiple tensors in PyTorch, or struggle with scaling operations to large collections of tensors, look no further! TensorDict is a PyTorch primitive that makes it easy to build dictionaries of tensors (and non-tensors) with a focus on composability and performance.

With TensorDict, you can apply the same operation to a collection of tensors with ease, eliminating the need for tedious loops and improving your code's readability. Our API is designed to be intuitive, mirroring the torch.Tensor API, so you can reshape, split, concatenate, clone, and more with familiar syntax.

But that's not all - TensorDict also optimizes operations under the hood, resulting in significant speedups. For example, casting a large collection of tensors to GPU can be up to 2x faster than using a regular Python loop. We used fused CUDA kernels for arithmetic ops so you can code an optimizer like ADAM in 5 lines of code (like you would do for a single tensor) and it will run much faster in eager and compile modes than you would have done with a regular for loop over your tensors.

Plus, our library comes with a GPU-accelerated dataclass (@tensorclass) for even more performance gains.

Other key features include: - torch. Compile (PT2) compatibility - Consolidation into a single storage for fast node-to-node communication - Support for memory-mapping and shared memory - TensorDict can be used as lightweight substitute to a dataloader thanks to `TensorDict.map_iter`, and do preproc on-device, outperforming regular dataloading speed by orders of magnitude (check the tutorials!)

We're excited to share TensorDict with the community and invite your feedback and contributions!

Try it out and let us know what you think!

By @albertbou - 9 months

a very convenient tool for any machine learning project in PyTorch! Speaking from experience.

TensorDict: A GPU-accelerated Python dictionary

Related

Show HN: txtai: open-source, production-focused vector search and RAG

Dynolog: Open-Source System Observability

PyTorch Lightning: A Comprehensive Hands-On Tutorial

txtai 7.3 released: Adds new RAG Web Apps and streaming LLM/RAG support

txtai: Open-source vector search and RAG for minimalists

Related

Show HN: txtai: open-source, production-focused vector search and RAG

Dynolog: Open-Source System Observability

PyTorch Lightning: A Comprehensive Hands-On Tutorial

txtai 7.3 released: Adds new RAG Web Apps and streaming LLM/RAG support

txtai: Open-source vector search and RAG for minimalists