Show HN: Oxen.ai – Fast Unstructured Data Version Control

16 points

3 years ago

Hi HN! This is Greg, founder of https://oxen.ai. Oxen at it's core is a version control system built from the ground up optimized for machine learning data. Just like other VCS, Oxen can a building block for a lot of workflows. We are a couple ex-IBM Watson engineers who have seen the waves of AI and know the importance of matching the correct data to the correct model deployment in ML workflows.

The commands are modeled one-for-one after git, but can handle large unstructured datasets of images, videos, audio, text, as well as tabular data.

If you have a few minutes, go to https://github.com/Oxen-AI/oxen-release#-oxen and check it out!

Built in Rust, with modern hashing algorithms and network protocols, it can index hundreds of thousands of images, video, audio, text, in seconds.

The vision is to help collaboration on data be as easy as collaboration on code. It still feels like we are in the dark ages of iterating on machine learning datasets, downloading zip files, dumping on s3, or using legacy technology like git-lfs.

We would also love to see a community grow around open source datasets, just like open source code. There is a web hub at https://www.oxen.ai where you can sign up for free.

Let us know what data you are training your models on today! Running models without the proper dataset infrastructure to iterate on can take the "learn" out of machine learning. With Oxen, we help close that loop.

5 comments