CrimsonDB: Zero Knobs, High Performance

Self-design

CrimsonDB decides autonomously how to change its core design to adjust to workload, hardware and other parameters. It can assume arbitrary shapes in between the core design classes of log, LSM-tree, B-tree, and their hybrids. It offers optimal reads and writes given a memory budget and an application workload.

Performance

We build CrimsonDB by mapping the whole possible design space of key-value stores. This allows us to discover new designs and optimizations. Through them we push read and write performance towards the optimal behavior, while at the same time discover the rules that govern automation.

What if?

CrimsonDB offers advanced system design exploration features such as allowing to reason about hardware properties and the potential benefit when adding new hardware or what kind of low level system design choices would bring a required performance property for a given workload.

Some of the technology is published in research papers. The Data Calculator paper in SIGMOD 2018 shows how we can synthesize more data structures than stars on the sky to pick the right one for a given problem. The Design continuums paper in CIDR 2019 shows how we can perceive all core NoSQL data structures as a single data structure! The Monkey SIGMOD 2017 and TODS 2018 papers show how to pick the optimal number of bits for each bloom filter in an LSM-tree and how to pick the optimal size ratio and merge policy. The Dostoevsky paper in SIGMOD 2018 offers better trade-offs between lookup costs, merge overheads, and storage space by identifying and removing superfluous merge operations in NoSQL systems.

data

workload

hardware

LSH-Table

Garbage Collection Threshold

Buffer size (MB)

Key Signature Size (bits)

Hash Bucket entries fraction

LSM-Tree

Leveling Tiering

Size ratio

Levels

Fixed FPRs Optimal FPRs

M_bf (bits per entry)

Buffer size (MB)

Fence Pointers' bits per entry

B^ε-tree

Buffer size (MB)

Levels

CrimsonDB: Design Continuum

Growth Factor (T)

Hot merge threshold (K)

Cold merge threshold (Z)

Levels (L)

M_f (bits per entry)

Buffer size (MB)

node size(D)

Write

Range Lookup

Point Lookup

Storage Space

Memory Footprint

Throughput

Publications

Niv Dayan, Stratos Idreos
The Log-Structured Merge-Bush & the Wacky Continuum.
In ACM SIGMOD International Conference on Management of Data , 2019

Stratos Idreos, Niv Dayan, Wilson Qin, Mali Akmanalp, Sophie Hilgard, Andrew Ross, James Lennon, Varun Jain, Harshita Gupta, David Li, Zichen Zhu
Design Continuums and the Path Toward Self-Designing Key-Value Stores that Know and Learn
In Biennial Conference on Innovative Data Systems Research (CIDR), 2019

Niv Dayan, Manos Athanassoulis, Stratos Idreos
Optimal Bloom Filters and Adaptive Merging for LSM-Trees
In ACM Transactions on Database Systems , 2018

Niv Dayan, Stratos Idreos
Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging
[paper video]
In ACM SIGMOD International Conference on Management of Data , 2018

Stratos Idreos, Kostas Zoumpatianos, Brian Hentschel, Michael S. Kester, Demi Guo
The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models
[paper video]
In ACM SIGMOD International Conference on Management of Data , 2018

Niv Dayan, Manos Athanassoulis, Stratos Idreos
Monkey: Optimal Navigable Key-Value Store
[paper website] [paper video]
In ACM SIGMOD International Conference on Management of Data , 2017