EmbedDb | High-Performance Storage Engine

Core Vault

Blazing Fast

Engineered for maximizing IOPS using optimized memory structures and sequential disk writes.

LSM Architecture

Classic Log-Structured Merge-tree design balancing write throughput with read efficiency.

Crash Safety

Write-Ahead Log (WAL) integration ensures zero data loss during power failures.

Binary Storage

Custom binary SSTable format with sorted keys for O(log n) lookup performance.

Performance Metrics

Stress tested on commodity hardware

> ./embeddb_bench --full-suite

Sequential Writes

PASSED

Throughput 128,456 ops/s

Latency 7.78 μs

Data Size 32MB

./embeddb 1000000 32 0

Random Writes

PASSED

Throughput 112,268 ops/s

Latency 8.90 μs

Shuffle Enabled

./embeddb 1000000 32 1

High Volume

PASSED

Throughput 122,203 ops/s

Operations 2,000,000

SSTables 21 Files

./embeddb 2000000 16 1

System Architecture

The lifecycle of data within EmbedDb

1

Write Request

Client sends key-value pair

2

WAL Append

Durability log update

3

Memtable

In-memory sorting

4

SSTable Flush

Immutable disk storage

Engineering Report

Implementation Details

The core challenge of EmbedDb was implementing a robust binary format for the Sorted String Tables (SSTables). Unlike text-based formats (CSV/JSON), the binary format enables random access patterns and significantly reduces storage overhead.

sstable.cpp

void SSTable::flush(const std::map<std::string, std::string>& data) {
    std::ofstream file(filename, std::ios::binary);
    
    // Header: Entry Count
    uint32_t count = data.size();
    file.write(reinterpret_cast<const char*>(&count), sizeof(count));

    // Body: [K_Len][Key][V_Len][Val]...
    for (const auto& [key, val] : data) {
        uint32_t k_len = key.size();
        uint32_t v_len = val.size();
        
        file.write(reinterpret_cast<const char*>(&k_len), sizeof(k_len));
        file.write(key.data(), k_len);
        ...
    }
}

Design Strategy

I chose the LSM-Tree structure specifically for its write-amplification properties. By buffering writes in memory (Memtable) and only flushing sequentially to disk, we bypass the random I/O bottleneck typical of B-Tree databases on spinning disks, though the benefits persist on nvme SSDs due to block-erase mechanics.

Ready to explore the code?

Dive into the C++ source code to see how the WAL, Memtable, and SSTable components interact.

Visit Repository