We built a library to interactively explore unstructured datasets directly from a dataframe: https://github.com/Renumics/spotlight
Some background: We have worked on different ML solutions over the years, mainly in the industrial AI space. A crucial step for us is always to inspect and explore the data interactively with the team and the customer. This is true throughout the dev process: During EDA, model debugging, model comparison and monitoring.
We have tried many different options for visualizing unstructured datasets in the past: Notebooks, dash apps, custom react apps, HTML reports... However, these options were either very time-consuming to develop/maintain or not interactive enough or both.
Our goal with Spotlight is to provide visualization for multimodal data with one line of code. Currently, Spotlight supports most unstructured data types including images, audio, text, videos, time-series and geometric data.
We like the Huggingface tooling and have a first integration for HF datasets. An immediate step on our roadmap is to improve this integration.
You can find more info and use case examples for ML and engineering workflows in the repo.