The commands are modeled one-for-one after git, but can handle large unstructured datasets of images, videos, audio, text, as well as tabular data.
If you have a few minutes, go to https://github.com/Oxen-AI/oxen-release#-oxen and check it out!
Built in Rust, with modern hashing algorithms and network protocols, it can index hundreds of thousands of images, video, audio, text, in seconds.
The vision is to help collaboration on data be as easy as collaboration on code. It still feels like we are in the dark ages of iterating on machine learning datasets, downloading zip files, dumping on s3, or using legacy technology like git-lfs.
We would also love to see a community grow around open source datasets, just like open source code. There is a web hub at https://www.oxen.ai where you can sign up for free.
Let us know what data you are training your models on today! Running models without the proper dataset infrastructure to iterate on can take the "learn" out of machine learning. With Oxen, we help close that loop.