I'd like to share my project, bridge-ds - a lightweight Python framework that simplifies how ML practitioners manage and interact with datasets.
Why bridge-ds? It abstracts the repetitive parts of dataset handling in real-world ML workflows, but remains lean enough as to not force opinionated workflow or unnecessary dependencies.
bridge-ds uses two complementary approaches:
- Macro-level: Treat your entire dataset like a DataFrame—filter, sort, and modify with familiar, intuitive operations. - Micro-level: Efficiently handle individual samples with lazy loading, caching, remote data access, and straightforward browsing.
You can find the project on GitHub[1], and the official documentation is also available[2].
This library is still in development, but I feel there's enough to look at to gather some first impressions and feedback, which would be greatly appreciated!
[1] https://github.com/guybuk/bridge-ds [2] https://bridge-ds.readthedocs.io/en/latest/