2 weeks ago I shared an alternative HN UI with built in LLM-powered summarization for links and discussions (dstill.ai/hackernews, https://news.ycombinator.com/item?id=36760714).
Today I am happy to share further work that allows you to summarize and chat with a select set of high quality podcasts. You can find these features at https://dstill.ai/podcasts and https://dstill.ai/agent.
Video demo: https://www.loom.com/share/1270256b2c9d4d88970b9cadd446ceb3?... (screenshots with more details below)
A bit more context:
The long term goal of https://dstill.ai is to provide you with the means to consume and access information in a more thoughtful and highly personal way. Making sure you never miss anything that's actually important to you and cutting all the fluff and distractions that are being thrown at us every day — hopefully somewhat balancing the fact that some companies are set to flood the internet with vacuous AI generated content.
Today’s release is only a small part of that vision, but I want to make sure we release early and often, and learn along the way.
Why podcasts? I think they are a unique source of highly authentic information that can help balance the generic and impersonal nature of something like ChatGPT. They also pose a challenge, because they are audio, and audio is much harder to work with compared to text. But more and more content is being exclusively produced in audio or video form, and so it’s strategically important to tackle this head on.
Here’s a quick rundown of the podcasts UI:
- Podcast list: https://screenbud.com/shot/07c31b34-0a1f-4658-a70a-0007b69a5... - Specific podcast: https://screenbud.com/shot/ca5b067e-428d-4fc7-a6ba-2cd87fe3e... - Specific podcast episode with summary: https://screenbud.com/shot/7056aa9e-9834-48c7-918a-6eef2694d...
Here are a few example conversations I had recently:
- [Andrew Huberman and Peter Attia on improving cardiovascular fitness.] https://screenbud.com/shot/8fddfb0d-51a6-45c7-bffd-a5644d106... - [Pros and cons for AI regulation, include quotes and a summary.] https://screenbud.com/shot/d526c793-5b52-45ee-b688-1de78be63...
On the technical level, here’s how things are structured:
- Podcasts are transcribed (using whisper) and diarized
- For summarization:
- Transcripts are split into chunks if necessary — many podcasts are over 20K tokens — and those are then summarized in a map/reduce fashion
- For chat:
- Transcripts are split into chunks, which are embedded using instructor-large
- The embeddings are stored in qdrant, along with metadata needed for filtering — this is where qdrant shines, it allows you to have *efficient* embedding based retrieval with filtering (see here for details https://qdrant.tech/articles/filtrable-hnsw/)
- There’s one LLM that generates search queries (if any), those are use for retrieval of the podcast chunks
- Then a second LLM is fed the conversation history along with the search results — currently relying on ChatGPT, though we are working on custom fine tuning of llama.
- The UI is TypeScript with SvelteKit — so far I love the simple and intuitive nature of Svelte(Kit).
- The backend is Rust, based on my own (multi-lingual) LLM framework tailored for configurability and observability (I wrote about it a bit here, including some demos of the platform: https://news.ycombinator.com/item?id=36787924#36789075)
- Right now there’s no account system, so you need to supply your own OpenAI API keys.
There’s still a lot of work to be done on both the podcast and agent experience, but I hope it’s going to provide some value as is already.I would be very happy to hear your thoughts and feedback, both on the general vision, as well as today’s podcast and agent release.
Cheers, Petr.