Show HN: TensorDict, GPU-accelerated Python dicts

2 points

2 years ago

If you're working with multiple torch.Tensor objects, you've probably got a lot of boilerplate code you'd like to get rid of. And if the collection of tensors is large enough, chances are that you're also struggling with finding the right schedule to dispatch operations to all of these. That's where TensorDict comes in - it's a PyTorch primitive that makes it easy to build dictionaries of tensors (and non-tensors) with a focus on composability and performance.

Convenience: With TensorDict, you can easily apply an op to a collection of tensors, eliminating the need for tedious loops and improving the code's readability. The API is designed to be intuitive, mirroring the torch.Tensor's one (e.g., `TensorDict.cuda()`, `TensorDict.chunk()` etc), so you can reshape, split, concatenate, clone, and more with familiar syntax.

Speed: That's not all - TensorDict also optimizes operations under the hood, resulting in significant speedups. For example, casting a large collection of tensors to GPU can be up to 2x faster than using a regular Python loop. We used fused CUDA kernels for arithmetic ops so you can code an optimizer like ADAM in 5 lines of code (like you would do for a single tensor) and it will run much faster in eager and compile modes than you would have done with a regular for loop over your tensors.

Plus, our library comes with a GPU-accelerated dataclass (@tensorclass) for those who want to set in stone the content of their data structure.

Other key features include:

- torch.compile (PT2) compatibility

- Consolidation into a single storage for fast node-to-node communication

- Support for memory-mapping and shared memory

TensorDict can be used as lightweight substitute to a dataloader thanks to `TensorDict.map_iter`, and do preproc on-device, outperforming regular dataloading speed by orders of magnitude (check the tutorials!)

Finally, if you're worried about added complexity, remember that TensorDict can be used as a drop-in replacement for `dict` (as it accepts non-tensor data too).

We're excited to share TensorDict with the community and invite your feedback and contributions!

Try it out and let us know what you think!