The first time I developed a big data application with Apache Spark, my spark job couldn’t finish because I partitioned the data incorrectly and accidentally wrote millions of extremely small files to S3...
As a backend developer I was amazed that in Big Data there isn't an APM solutions (like data dog or new relic) to debug this kind of performance issues.
DataFlint is my attempt to solve this problem. Will really appreciate feedback!
Show HN: DataFlint, performance monitoring for Apache Spark | Heykuki News