Deep Learning Translation: NLLB 200 vs. M2M100 vs. Opus MT

1 point

4 years ago

Hello,

Recently I've extensively tested Facebook's NLLB 200 3.3B (https://huggingface.co/facebook/nllb-200-3.3B) and M2M100 1.2B (https://huggingface.co/facebook/m2m100_1.2B) models for deep learning translation, as well as Helsinki's Opus MT (https://huggingface.co/Helsinki-NLP/).

My goal is to propose the best translation model on NLP Cloud (https://nlpcloud.com), while keeping server costs minimal, and human maintenance as easy as possible. Here are my conclusions:

- Opus MT gives good results and latency is very good, but it requires 1 model per language pair, which makes it a good candidate if your are only using one language pair, but not if you're using hundreds of languages. Besides, many language pairs are actually missing (Norwegian for example doesn't seem to be supported).

- M2M100 can translate in 100 languages, which makes it much easier to use than Opus MT if you need to use several languages. But quality is below Opus MT in my tests, and adult content isn't supported (the model replaces sexual content with funny words for examples). Latency is below Opus MT and it requires more advanced hardware (without a GPU the latency is really long).

- NLLB 200 can translate in 200 languages, which makes it even more attractive! Quality seems to be on par with Opus MT in the languages we've tested. The model does not enforce any sort of filtering on adult content. Latency is still a bit below Opus MT and it requires even more advanced hardware.

So my conclusion is that NLLB is the best candidate for NLP Cloud.

But I'm wondering if you've made similar comparisons on your end? If so, I would love to hear your opinion!

Julien