Those of you who have stumbled upon ML before will know that Python is the go-to language for data-related things. It has high-quality libraries for analysis, modeling, and visualization. scikit-learn is a notable example and for good reasons; it's well maintained, has a large community, it's performant and it has a really good API (there's a paper about how they designed it: https://arxiv.org/abs/1309.0238).
I had been looking for a Scala equivalent for quite some time and then finally decided to start coding it myself. The main reason is that JVM-based languages are very common for building data pipelines and having the ability to serve predictive models directly within the pipeline offers several advantages. Here's some data to back-up my claims: https://cloud.google.com/solutions/comparing-ml-model-predic... (comparison of serving the model within the pipeline vs. calling a REST API).
The project currently has two main goals. It tries to expose its functionality through an intuitive API (mimic scikit-learn but use idiomatic Scala features and functional constructs) and provides performant implementations of common algorithms (here is a limited set of comparisons with scikit-learn implementations: https://github.com/picnicml/doddle-benchmark).
Here are some links if you are interested in taking a look: - website: https://picnicml.github.io - GitHub repo: https://github.com/picnicml/doddle-model - code examples: https://github.com/picnicml/doddle-model-examples - a blog post: https://towardsdatascience.com/recognising-handwritten-digit...