Thirukural aka Kural <https://en.wikipedia.org/wiki/Kural> is an ancient Tamil text passed down from generation to generation for at least two millennia. This text has been continually in print since the 1800s. This is really astonishing for a largely secular text to survive for centuries before printing presses were invented. It is a collection of 1330 couplets divided into 3 parts, Virtue, Wealth and Love. Each part has a collection of chapters, each chapter has 10 couplets under a specific theme.
I try to read one to two couplets a day and I read most of them. As a kid, I used to be fascinated by a few folks who memorized all the text and could give one or two examples for any given scenario. Couple of times I tried to create a search engine that could find relevant Kurals for a given query. Most of my past efforts relied around keyword matching.
During last Christmas break, I started this project to tackle this problem again. I think it largely works. Please take it for a spin and provide your feedback. For those who grew up in west, this phrase might be interesting starting point.
- Can you find kurals that are similar to the bible phrase "turn the other cheek"?
How is it done? Each couplet had a few Tamil explanations and one English translation with one line English explanation. My first attempt was to create an embedding with Tamil and English explanations. Search was not that great. Then I took each couplet and wrote a prompt with original text, explanations and asked ChatGPT to create a page long summary. With this summary, I created embeddings and stored it along with some meta data.
When you search something, it creates an embedding for the query and compares the querie's embedding with each couplet's embedding and returns the top 10 matching couplets.
Tech Stack:
- ChatGPT
- NextJs
- Supabase with pgvector.
What is next?(Would love your feedback) - Improve the search. Currently all results are purely based on couplet level. I would like to expand this. If you ask about a prticular chapter or part, it doesn't do really well. It just simply returns the top 10 matching couplets. I would like to make it more like Perplexity.ai and provide links to chapters.
- I have an idea for a game that lets you pick 7 words for a given couplet and see how close your selection to others.
- Tamil has a large body of ancient texts<https://en.wikipedia.org/wiki/Sangam_literature> that are very intresting to me. I would like to implement search on these.