v1.0 Production Ready

High-Performance
LSM Storage Engine

A C++17 key-value store engineered for speed. Sub-10μs latency, crash safety via WAL, and efficient on-disk serialization.

128K
Ops / Sec
7.8μs
Latency
C++17
Core

Core Vault

Blazing Fast

Engineered for maximizing IOPS using optimized memory structures and sequential disk writes.

LSM Architecture

Classic Log-Structured Merge-tree design balancing write throughput with read efficiency.

Crash Safety

Write-Ahead Log (WAL) integration ensures zero data loss during power failures.

Binary Storage

Custom binary SSTable format with sorted keys for O(log n) lookup performance.

Performance Metrics

Stress tested on commodity hardware

> ./embeddb_bench --full-suite

Sequential Writes

PASSED
Throughput 128,456 ops/s
Latency 7.78 μs
Data Size 32MB
./embeddb 1000000 32 0

Random Writes

PASSED
Throughput 112,268 ops/s
Latency 8.90 μs
Shuffle Enabled
./embeddb 1000000 32 1

High Volume

PASSED
Throughput 122,203 ops/s
Operations 2,000,000
SSTables 21 Files
./embeddb 2000000 16 1

System Architecture

The lifecycle of data within EmbedDb

1

Write Request

Client sends key-value pair

2

WAL Append

Durability log update

3

Memtable

In-memory sorting

4

SSTable Flush

Immutable disk storage

Engineering Report

Implementation Details

The core challenge of EmbedDb was implementing a robust binary format for the Sorted String Tables (SSTables). Unlike text-based formats (CSV/JSON), the binary format enables random access patterns and significantly reduces storage overhead.

sstable.cpp
void SSTable::flush(const std::map<std::string, std::string>& data) {
    std::ofstream file(filename, std::ios::binary);
    
    // Header: Entry Count
    uint32_t count = data.size();
    file.write(reinterpret_cast<const char*>(&count), sizeof(count));

    // Body: [K_Len][Key][V_Len][Val]...
    for (const auto& [key, val] : data) {
        uint32_t k_len = key.size();
        uint32_t v_len = val.size();
        
        file.write(reinterpret_cast<const char*>(&k_len), sizeof(k_len));
        file.write(key.data(), k_len);
        ...
    }
}

Design Strategy

I chose the LSM-Tree structure specifically for its write-amplification properties. By buffering writes in memory (Memtable) and only flushing sequentially to disk, we bypass the random I/O bottleneck typical of B-Tree databases on spinning disks, though the benefits persist on nvme SSDs due to block-erase mechanics.

Ready to explore the code?

Dive into the C++ source code to see how the WAL, Memtable, and SSTable components interact.

Visit Repository