* https://huggingface.co/facebook/wav2vec2-large-robust-ft-swbd-300h
* https://huggingface.co/datasets/sil-ai/bloom-speech
* https://huggingface.co/spaces/Matthijs/speecht5-asr-demo
* https://alphacephei.com/vosk/
It's a bit hard to search for, since "speech to text" results are more often than not interspersed with results for "text to speech" (for which there are many more results altogether apparently).
In any case, there seems to be quite some hardware and/or tinkering required to experiment with the ones I found.
So I'm asking if anyone here has already experimented a bit in this space and would share their experiences? Would also be great to learn on what hardware.