Show HN: Machine Learning on JVM or a piece of cake

4 points

8 years ago

Hello, people of HN. Let me first say that this post is about promoting an open source project which I've been working on for the past 10 months or so. I'm leaving it here with hopes of getting in touch with other devs who might be interested in machine learning, Scala or both.

Those of you who have stumbled upon ML before will know that Python is the go-to language for data-related things. It has high-quality libraries for analysis, modeling, and visualization. scikit-learn is a notable example and for good reasons; it's well maintained, has a large community, it's performant and it has a really good API (there's a paper about how they designed it: https://arxiv.org/abs/1309.0238).

I had been looking for a Scala equivalent for quite some time and then finally decided to start coding it myself. The main reason is that JVM-based languages are very common for building data pipelines and having the ability to serve predictive models directly within the pipeline offers several advantages. Here's some data to back-up my claims: https://cloud.google.com/solutions/comparing-ml-model-predic... (comparison of serving the model within the pipeline vs. calling a REST API).

The project currently has two main goals. It tries to expose its functionality through an intuitive API (mimic scikit-learn but use idiomatic Scala features and functional constructs) and provides performant implementations of common algorithms (here is a limited set of comparisons with scikit-learn implementations: https://github.com/picnicml/doddle-benchmark).

Here are some links if you are interested in taking a look: - website: https://picnicml.github.io - GitHub repo: https://github.com/picnicml/doddle-model - code examples: https://github.com/picnicml/doddle-model-examples - a blog post: https://towardsdatascience.com/recognising-handwritten-digit...