Convenience: With TensorDict, you can easily apply an op to a collection of tensors, eliminating the need for tedious loops and improving the code's readability. The API is designed to be intuitive, mirroring the torch.Tensor's one (e.g., `TensorDict.cuda()`, `TensorDict.chunk()` etc), so you can reshape, split, concatenate, clone, and more with familiar syntax.
Speed: That's not all - TensorDict also optimizes operations under the hood, resulting in significant speedups. For example, casting a large collection of tensors to GPU can be up to 2x faster than using a regular Python loop. We used fused CUDA kernels for arithmetic ops so you can code an optimizer like ADAM in 5 lines of code (like you would do for a single tensor) and it will run much faster in eager and compile modes than you would have done with a regular for loop over your tensors.
Plus, our library comes with a GPU-accelerated dataclass (@tensorclass) for those who want to set in stone the content of their data structure.
Other key features include:
- torch.compile (PT2) compatibility
- Consolidation into a single storage for fast node-to-node communication
- Support for memory-mapping and shared memory
TensorDict can be used as lightweight substitute to a dataloader thanks to `TensorDict.map_iter`, and do preproc on-device, outperforming regular dataloading speed by orders of magnitude (check the tutorials!)
Finally, if you're worried about added complexity, remember that TensorDict can be used as a drop-in replacement for `dict` (as it accepts non-tensor data too).
We're excited to share TensorDict with the community and invite your feedback and contributions!
Try it out and let us know what you think!