In 2021, we started building our OSS search engine[0] with the objective to search directly on object storage and substantially lower the infrastructure costs of search at a large scale.
Two years after our common crawl demo hit the front page[1], our latest release, Quickwit 0.6, is finally reaching the point where our engine can be a drop-in replacement of Elasticsearch for logs and traces with all your data on object storage.
Two years is both very long for a startup and very short when you build a distributed engine. And we decided to do it the hard way: we implemented our own OSS gossip library[2], our own {S3,JSON}-friendly columnar format[3], and of course, we maintain our own search library, tantivy[4], a Lucene equivalent in Rust. From time to time, we also do fun things like language detection[5], always OSS. This is a lot of engineering investment and it takes some time to finally reach the end users.
I won’t tell you Quickwit is XX times cheaper or faster as we don’t have yet published an unbiased benchmark - I’m working on it and it’s really hard to avoid the common pitfalls.
Instead, I will just give two links to show the performance is really good on search and indexing and comments from 2 users we discovered on Twitter/HN:
- Search and do analytics on the whole GitHub archive dataset (5.6 billion events ~ 21TB JSON) in less than 2 seconds with a few nodes. See the short demo [6]. - You can achieve an indexing throughput of 1.5GB/s with 40 nodes and it scales linearly [7]. - Nice comment from Arnon Rotem-Gal-Oz, VP, Chief Architect at SAP: « I've just spent the day setting up v.05 on k8s with SSL to Kafka source, VRL transformations, and persistence to Azure -I can see it is shaping up as an alternative to our ES-based log handling. » [8] - Another nice comment seen on HN « it seems to be very easy to run, not very IO intensive, and running fine on a single node with modest hardware with >2 billion log rows. It has a really cool dynamic schema feature too.» [9]
Fun fact: at least 4 users are using Garage[10] as the object storage, this OSS project looks really promising and made the HN front page a few months ago[11], we really cherish the OSS for this kind of unexpected combination.
Any feedback positive/negative always greatly appreciated here!
[0] Quickwit repo: https://github.com/quickwit-oss/quickwit
[1] Searching the web under 1000$/month: https://news.ycombinator.com/item?id=27074481
[2] Chitchat gossip library: https://github.com/quickwit-oss/chitchat
[3] Columnar format: https://github.com/quickwit-oss/tantivy/tree/main/columnar
[4] Tantivy library: https://github.com/quickwit-oss/tantivy/
[5] Whichlang library: https://github.com/quickwit-oss/whichlang
[6] GitHub Archive demo in terminal: https://www.youtube.com/watch?v=SNq3bARRlDI
[7] Indexing performance: https://twitter.com/fulmicoton/status/1638016949459488768
[8] https://twitter.com/arnonrgo/status/1645429632303235073?s=20
[9] https://news.ycombinator.com/item?id=35742544
[10] Garage object storage: https://garagehq.deuxfleurs.fr/
[11] https://news.ycombinator.com/item?id=33853539