Unlike existing data profilers, Desbordante focuses on discovering complex patterns in data, which are notoriously hard to extract. Since its launch in 2019, it has become the fastest open-source tool for these tasks, while also offering many patterns which have no alternative implementations. We already showcased it here back in April — see https://news.ycombinator.com/item?id=40063137.
With this release, Desbordante now supports 24 types of patterns, 32 types of pattern-related tasks, and 34 algorithms for their discovery or validation. The key updates of this release are:
1) Discovery of many exciting patterns, with differential and matching dependencies being the most interesting (take a look at the new example scripts in our repo for a real-world demo)
2) Five new metrics used to define subtypes of approximate functional dependencies — they’re gaining popularity in the academic community for a wide selection of use-cases
3) Validation of denial constraints, which provide users with the freedom to check their own complex data constraints (expressed by a Boolean formula) over a table
4) Laid out the groundwork for dynamic algorithms — a novel type of pattern validation
- They track changes in the dataset to update their result on-the-fly rather than process the whole table again
- Implemented an example algorithm: dynamic functional dependency validator with a Python interface
5) Expanded to 33 examples across three categories:
- Basic — a clear demo of a singular pattern,
- Advanced — exploration of pattern variations and complexities,
- Expert — real-world solutions which use pattern discovery and validation
Of course, we have fixed several critical bugs, added new algorithms for existing patterns, and made some of the old ones run faster.
More about the release can be found here https://github.com/Desbordante/desbordante-core/releases/tag... and more about the tool here https://github.com/Desbordante/desbordante-core.