Show HN: New Semantic text chunking API for GPT tech (Freemium live on RapidAPI)

3 points

3 years ago

Constantly with NLP, I run into this problem: I have a bunch of text from a transcript or webscrape, and need to group it together semantically(by meaning) for GPT prompting. Anybody looking to combine GPT models with semantic search has probably noticed that the quality and how you split snippets you provide really matters! If you put text spanning a bunch of topics in the first half, and one core topic in the second half into the latest models and ask for a summary, it might miss the topic covered in the second half for example, even if the topic in the second half was the main topic and the first half was on sporadic topics!

Things like semantic search also work much better when embeddings reflect the semantic segments in the text - since that is their training set (at least, for MS MARCO). But there aren't any quickly usable huggingface models or open source tools to covert plaintext into semantically chunked text.

That's why I've launched this Semantic Text Chunking based on an internal product we rely on. The demo allows you to try it yourself, see a sample on a transcript, and there's a 50 request free tier.

I admired Audacity Autochapter API (which is great for audio) but really wanted a quick'n'easy way to get the same value out of text. I found out why there isn't one on HuggingFace: It's really hard to get right and optimise for different input sizes. There are models like https://huggingface.co/dennlinger/bert-wiki-paragraphs but these require you to manually first split by sentences, make sure the model works for your domain, and set up a batch process on a GPU for decent speed - in other works, it's a right pain to get this working, despite some great open source models to base this on.

Having wasted a week of dev time on what we thought would be a one-day task, we've published a freemium API for this. It can boost any GPT tool and give you an edge over other tools that don't semantically segment their content.

1) Freemium - you can demo it now 2) We've added 2 options to help you control this output (feel free to request more options) 3) No data sent to the API is ever saved by us

Quick demo: https://rapidapi.com/rapidinterconnect/api/text-semantic-spl...

HN users - feel free to contact the account for an increased soft limit on the free version =)