Show HN: Saturday Morning Breakfast Cereal Search Engine

1 point

4 years ago

Hey HN,

I've created a frontend search engine to search through SMBC Comics[0]. There was no index, so I created a program that uses OpenCV EAST and Tesseract to extract text and other metadata from all 6000+ comics. A challenge that I came across is that the data that is being searched is imperfect (OCR isn't the best with handwritten-esque text). Search is done in the frontend with Fuse.js[1].

The comics are available to search at https://sambhavsaggi.github.io/comic-fulltext-search/

Currently, it's a little slow since it has to search through megabytes of text in the browser, but response times are all less than 1.5 seconds on my laptop (your mileage may vary).

Some things I could use assistance on include improved text extraction from comics and faster/better search (maybe in WASM?)

---

The code for this project is available here:

Frontend: https://github.com/sambhavsaggi/comic-fulltext-search

Text extractor: https://github.com/sambhavsaggi/comic-ocr

[0] https://www.smbc-comics.com

[1] https://fusejs.io/

---

Thanks!