Show HN: Reverb ASR+Diarization, the Best Open Source ASR for Long-Form Audio

12 points

2 years ago

Hey everyone,

My name is Lee Harris and I'm the VP of Engineering for Rev.com / Rev.ai.

Today, we are launching and open sourcing our current generation ASR models named "Reverb."

When OpenAI launched Whisper at Interspeech two years ago, it turned the ASR world upside down. Today, Rev is building on that foundation with Reverb, the world's #1 ASR model for long-form transcription – now open-source.

I am proud to announce that we are releasing two models today, Reverb and Reverb Turbo, through our API, self-hosted, and our open source + open weights solution.

----------

We are releasing in the following formats:

- A research-oriented release that doesn't include our end to end pipeline and is missing our WFST (Weighted Finite-State Transducer) implementation. This is primarily in Python and intended for research, exploratory, or custom usage within your ecosystem.

- A developer-oriented release that includes our entire end-to-end pipeline for environments at any scale. It is a combination of C# for the APIs, C++ for our inference engine, and Python for various pieces.

- A new set of end-to-end APIs that are priced at $0.20/hour for Reverb and $0.10/hour for Reverb Turbo.

----------

What makes Reverb special?

- Reverb was trained on 200,000+ hours of extremely high quality and varied transcribed audio from Rev.com expert transcribers. This high quality data set was chosen as a subset from 7+ million hours of Rev audio.

- The model runs extremely well on CPU, GPU, iOS/Android, IoT, and other platforms. Our developer implementation is primarily optimized for x64 CPU today, but a GPU optimized version will be released this year.

- The model excels in noisy, real-world environments. Real data was used during the training and every audio was handled by an expert transcriptionist.

- You can tune your results for vertabimicity, allowing you to have nicely formatted, opinionated outputs OR true verbatim output. This is the #1 area where Reverb substantially outperforms the competition.

----------

Benchmarks

Here are some WER (word error rate) benchmarks on Rev's various solutions for Earnings21 and Earnings22 (very challenging audio):

- Reverb

  - Earnings21: 7.99 WER

  - Earnings22: 7.06 WER

- Reverb Turbo

  - Earnings21: 8.25 WER

  - Earnings22: 7.50 WER

- Reverb Research

  - Earnings21: 10.30 WER

  - Earnings22: 9.08 WER

- Whisper large-v3

  - Earnings21: 10.67 WER

  - Earnings22: 11.37 WER

- Canary-1B

  - Earnings21: 13.82 WER

  - Earnings22: 13.24 WER

----------

Licensing

Our models are released under a non-commercial / research license that allow for personal, research, and evaluation use. If you wish to use it for commercial purposes, you have 3 options:

- Usage based API @ $0.20/hr for Reverb, $0.10/hr for Reverb Turbo.

- Usage based self-hosted container at the same price as our API.

- Unlimited use license at custom pricing. Contact us at [email protected].

----------

Final Thoughts

I highly recommend that anyone interested take a look at our fantastic technical blog written by one of our Staff Speech Scientists, Jenny Drexler Fox. We look forward to hearing community feedback and we look forward to sharing even more of our models and research in the near future. Thank you!

----------

Links

Technical blog: https://www.rev.com/blog/speech-to-text-technology/introduci...

Launch blog / news post: https://www.rev.com/blog/speech-to-text-technology/open-sour...

GitHub research release: https://github.com/revdotcom/reverb

GitHub self-hosted release: https://github.com/revdotcom/reverb-self-hosted

Huggingface ASR link: https://huggingface.co/Revai/reverb-asr

Huggingface Diarization V1 link: https://huggingface.co/Revai/reverb-diarization-v1

HuggingFace Diarization V2 link: https://huggingface.co/Revai/reverb-diarization-v2

1 comment