Things like semantic search also work much better when embeddings reflect the semantic segments in the text - since that is their training set (at least, for MS MARCO). But there aren't any quickly usable huggingface models or open source tools to covert plaintext into semantically chunked text.
That's why I've launched this Semantic Text Chunking based on an internal product we rely on. The demo allows you to try it yourself, see a sample on a transcript, and there's a 50 request free tier.
I admired Audacity Autochapter API (which is great for audio) but really wanted a quick'n'easy way to get the same value out of text. I found out why there isn't one on HuggingFace: It's really hard to get right and optimise for different input sizes. There are models like https://huggingface.co/dennlinger/bert-wiki-paragraphs but these require you to manually first split by sentences, make sure the model works for your domain, and set up a batch process on a GPU for decent speed - in other works, it's a right pain to get this working, despite some great open source models to base this on.
Having wasted a week of dev time on what we thought would be a one-day task, we've published a freemium API for this. It can boost any GPT tool and give you an edge over other tools that don't semantically segment their content.
1) Freemium - you can demo it now 2) We've added 2 options to help you control this output (feel free to request more options) 3) No data sent to the API is ever saved by us
Quick demo: https://rapidapi.com/rapidinterconnect/api/text-semantic-spl...
HN users - feel free to contact the account for an increased soft limit on the free version =)