The core problem we are trying to solve is: there's a lot of useful business insights you can get from text data, but it's hard to do analytics on it.
- SQL is built for tabular / structured data, but when it comes to text, the best you can do is do keyword search.
- In the pre-LLM world, you might resort to training a lightweight text classifier, but you have to manually label a lot of data to get good accuracy. This means managing a team of operations people to do the labeling and building a lot of infrastructure to train and deploy the model.
- Today it’s easier to get good results with simple LLM prompts, but the "large" in "large language models" means that running on big production-level datasets becomes prohibitively expensive.
Our solution to this is to impose structure on this unstructured data with a combination of LLMs and lightweight embedding classifiers.
- Using Tidepool, a user can query the data by creating an "attribute." An attribute is a characteristic of the data that you want to analyze, defined in natural language. This could be “sentiment of reviews,” “messages mentioning legal topics,” “prompts containing code snippets,” etc.
- Tidepool structures the unstructured text by finding categories of interest for that attribute. For example, "positive vs negative vs neutral sentiment" or "C++ vs Python vs Javascript code snippets." We use an LLM to categorize a subset of the data for a user to review and refine the categorizations.
- We then use the LLM categorized outputs to train a lightweight embedding classifier. This classifier then cheaply categorizes all existing and future data.
- A user can either chart the categorized outputs in Tidepool or export them back to their data warehouse for further analysis in a business intelligence tool like Mode / Looker.
Our first use case was for analyzing user prompts into LLM apps. Our customers have tens of millions of user submitted prompts, and they want to analyze usage patterns so they can improve their product. Tidepool helped them answer questions like:
- What are the most common types of prompts for different user groups?
- How common are different failure modes?
- What type of actions correlate strongly with success metrics like engagement?
Over time, we saw that our customers used Tidepool not just for analyzing user prompts for LLM use cases, but also were starting to look at general text content (think of user reviews, social media posts, documents, etc.). In principle, this makes a lot of sense - it's all just text at the end of the day! So we relaunched our site with some of these new capabilities in mind.
Anyway, here's a short video demo of Tidepool: https://youtu.be/2yGTZBAH1T4
Happy to share more about what we've learned from talking to people in the AI space and building Tidepool for the last few months, feel free to comment your questions here!