when building ML systems for industrial AI, we have learned that data inspection is critical during the ML development process. We are also big fans of the Hugging Face ecosystem.
That is why we built an integration to our data exploration tool Spotlight that allows you to interactively explore Hugging Face datasets with one line of code.
Spotlight lets you leverage model results such as predictions and embeddings to gain a deeper understanding in data segments and model failure modes.
Currently, many many NLP, CV, Audio and multimodal datasets are supported both locally and on the hub.
You can find more info on the Hugging Face blog: https://huggingface.co/blog/scalable-data-inspection
Link to repo: https://github.com/Renumics/spotlight
Happy to hear your feedback!