Undoubtedly, putting large files in s3 (or similar object store) or NFS is the most common solution. However, when it comes to version control, Git is the defacto solution. But Git is not designed for versioning large files. If we want to put large files on S3, why don’t we just use a tool that does the data versioning just on top of S3? This is the motivation why we develop the ArtiV(Artifact Versions), a version control tool for large files.
For more detail, I recommend you to read this blog post https://blog.infuseai.io/a-modern-approach-to-versioning-large-datasets-for-machine-learning-fca2f541dd85
Or check out our git repository https://github.com/InfuseAI/artiv