NuDB: A Fast Key/Value Store for SSDs (C++, Open Source)

27 points

10 years ago

Hi, I work at Ripple and we've just gotten done putting some finishing touches on a new open source key/value database written in C++. Its called NuDB, and its got these features:

* Header-only, C++11

* For SSDs or high-IOPS devices

* Low memory footprint, zero caches!

* Performance independent of growth!

* Insert-only (no update or delete)

* Data size up to 2^32-1

* Database size up to 2^64-1

* Fault tolerant (uses a rollback file)

* Concurrent, fast reads

This database keeps the SAME performance no matter how big the data set grows!

We were using RocksDB which was fairly good but as the size of our distributed ledger grew, the performance started to go down. Some investigation showed that RocksDB allocates memory to cache the various bloom filters and indexes that it needs to implement the log-structured merge algorithm.

We took a step back and said, if we're going to have an insert-only database that is huge (hundreds of terabytes), with a random access pattern (keys uniformly random distributed, for when the key is a cryptographic digest like SHA256 of the data), then no amount of RAM for caches is going to help.

The thinking was to write a new key/value store that implements a hash table but on disk. Every lookup for an item would require on average, only a single I/O to read the block from the key file (and subsequent I/O to read the value). This is by no means novel, there are other implementations that do this such as Berkeley DB, Sparkey, et. al. But we believe we have invented something new, in the treatment of full buckets, that is performing spectacularly in our production Ripple environment, and we'd like to share it with you:

https://github.com/vinniefalco/NuDB