cppalliance / nudb Goto Github PK

NuDB: A fast key/value insert-only database for SSD drives in C++11

License: Boost Software License 1.0

C++ 89.22% CMake 6.17% Shell 4.22% Python 0.28% PHP 0.10%

nudb's Introduction

Branch	`master`	`develop`
Docs
Drone
codecov.io
License

A Key/Value Store For SSDs

Introduction
Description
Requirements
Example
Building
Algorithm
License
Contact

Introduction

NuDB is an append-only, key/value store specifically optimized for random read performance on modern SSDs or equivalent high-IOPS devices. The most common application for NuDB is content addressable storage where a cryptographic digest of the data is used as the key. The read performance and memory usage are independent of the size of the database. These are some other features:

Low memory footprint
Database size up to 281TB
All keys are the same size
Append-only, no update or delete
Value sizes from 1 to 2^32 bytes (4GB)
Performance independent of growth
Optimized for concurrent fetch
Key file can be rebuilt if needed
Inserts are atomic and consistent
Data file may be efficiently iterated
Key and data files may be on different devices
Hardened against algorithmic complexity attacks
Header-only, no separate library to build

Description

This software is close to final. Interfaces are stable. For recent changes see the CHANGELOG.

NuDB has been in use for over a year on production servers running rippled, with database sizes over 3 terabytes.

Requirements

Boost 1.69 or higher
C++11 or greater
SSD drive, or equivalent device with high IOPS

These components are optionally required in order to build the tests and examples:

CMake 3.7.2 or later (optional)
Properly configured bjam/b2 (optional)

Example

This complete program creates a database, opens the database, inserts several key/value pairs, fetches the key/value pairs, closes the database, then erases the database files. Source code for this program is located in the examples directory.

#include <nudb/nudb.hpp>
#include <cstddef>
#include <cstdint>

int main()
{
    using namespace nudb;
    std::size_t constexpr N = 1000;
    using key_type = std::uint32_t;
    error_code ec;
    auto const dat_path = "db.dat";
    auto const key_path = "db.key";
    auto const log_path = "db.log";
    create<xxhasher>(
        dat_path, key_path, log_path,
        1,
        make_salt(),
        sizeof(key_type),
        block_size("."),
        0.5f,
        ec);
    store db;
    db.open(dat_path, key_path, log_path, ec);
    char data = 0;
    // Insert
    for(key_type i = 0; i < N; ++i)
        db.insert(&i, &data, sizeof(data), ec);
    // Fetch
    for(key_type i = 0; i < N; ++i)
        db.fetch(&i,
            [&](void const* buffer, std::size_t size)
        {
            // do something with buffer, size
        }, ec);
    db.close(ec);
    erase_file(dat_path);
    erase_file(key_path);
    erase_file(log_path);
}

Building

NuDB is header-only so there are no libraries to build. To use it in your project, simply copy the NuDB sources to your project's source tree (alternatively, bring NuDB into your Git repository using the git subtree or git submodule commands). Then, edit your build scripts to add the include/ directory to the list of paths checked by the C++ compiler when searching for includes. NuDB #include lines will look like this:

#include <nudb/nudb.hpp>

To link your program successfully, you'll need to add the Boost.Thread and Boost.System libraries to link with. Please visit the Boost documentation for instructions on how to do this for your particular build system.

NuDB tests require Beast, and the benchmarks require RocksDB. These projects are linked to the repository using git submodules. Before building the tests or benchmarks, these commands should be issued at the root of the repository:

git submodule init
git submodule update

For the examples and tests, NuDB provides build scripts for Boost.Build (b2) and CMake. To generate build scripts using CMake, execute these commands at the root of the repository (project and solution files will be generated for Visual Studio users):

cd bin
cmake ..                                    # for 32-bit Windows build

cd ../bin64
cmake ..                                    # for Linux/Mac builds, OR
cmake -G"Visual Studio 14 2015 Win64" ..    # for 64-bit Windows builds

To build with Boost.Build, it is necessary to have the b2 executable in your path. And b2 needs to know how to find the Boost sources. The easiest way to do this is make sure that the version of b2 in your path is the one at the root of the Boost source tree, which is built when running bootstrap.sh (or bootstrap.bat on Windows).

Once b2 is in your path, simply run b2 in the root of the Beast repository to automatically build the required Boost libraries if they are not already built, build the examples, then build and run the unit tests.

On OSX it may be necessary to pass "toolset=clang" on the b2 command line. Alternatively, this may be site in site-config.jam or user-config.jam.

The files in the repository are laid out thusly:

./
    bench/          Holds the benchmark sources and scripts
    bin/            Holds executables and project files
    bin64/          Holds 64-bit Windows executables and project files
    examples/       Holds example program source code
    extras/         Additional APIs, may change
    include/        Add this to your compiler includes
        nudb/
    test/           Unit tests and benchmarks
    tools/          Holds the command line tool sources

Algorithm

Three files are used.

The data file holds keys and values stored sequentially and size-prefixed.
The key file holds a series of fixed-size bucket records forming an on-disk hash table.
The log file stores bookkeeping information used to restore consistency when an external failure occurs.

In typical cases a fetch costs one I/O cycle to consult the key file, and if the key is present, one I/O cycle to read the value.

Usage

Callers must define these parameters when creating a database:

KeySize: The size of a key in bytes.
BlockSize: The physical size of a key file record.

The ideal block size matches the sector size or block size of the underlying physical media that holds the key file. Functions are provided to return a best estimate of this value for a particular device, but a default of 4096 should work for typical installations. The implementation tries to fit as many entries as possible in a key file record, to maximize the amount of useful work performed per I/O.

LoadFactor: The desired fraction of bucket occupancy

LoadFactor is chosen to make bucket overflows unlikely without sacrificing bucket occupancy. A value of 0.50 seems to work well with a good hash function.

Callers must also provide these parameters when a database is opened:

Appnum: An application-defined integer constant which can be retrieved later from the database [TODO].
AllocSize: A significant multiple of the average data size.

Memory is recycled to improve performance, so NuDB needs AllocSize as a hint about the average size of the data being inserted. For an average data size of 1KB (one kilobyte), AllocSize of sixteen megabytes (16MB) is sufficient. If the AllocSize is too low, the memory recycler will not make efficient use of allocated blocks.

Two operations are defined: fetch, and insert.

`fetch`

The fetch operation retrieves a variable length value given the key. The caller supplies a factory used to provide a buffer for storing the value. This interface allows custom memory allocation strategies.

`insert`

insert adds a key/value pair to the store. Value data must contain at least one byte. Duplicate keys are disallowed. Insertions are serialized, which means [TODO].

Implementation

All insertions are buffered in memory, with inserted values becoming immediately discoverable in subsequent or concurrent calls to fetch. Periodically, buffered data is safely committed to disk files using a separate dedicated thread associated with the database. This commit process takes place at least once per second, or more often during a detected surge in insertion activity. In the commit process the key/value pairs receive the following treatment:

An insertion is performed by appending a value record to the data file. The value record has some header information including the size of the data and a copy of the key; the data file is iteratable without the key file. The value data follows the header. The data file is append-only and immutable: once written, bytes are never changed.

Initially the hash table in the key file consists of a single bucket. After the load factor is exceeded from insertions, the hash table grows in size by one bucket by doing a "split". The split operation is the linear hashing algorithm as described by Litwin and Larson.

When a bucket is split, each key is rehashed, and either remains in the original bucket or gets moved to the a bucket appended to the end of the key file.

An insertion on a full bucket first triggers the "spill" algorithm.

First, a spill record is appended to the data file, containing header information followed by the entire bucket record. Then the bucket's size is set to zero and the offset of the spill record is stored in the bucket. At this point the insertion may proceed normally, since the bucket is empty. Spilled buckets in the data file are always full.

Because every bucket holds the offset of the next spill record in the data file, the buckets form a linked list. In practice, careful selection of capacity and load factor will keep the percentage of buckets with one spill record to a minimum, with no bucket requiring two spill records.

The implementation of fetch is straightforward: first the bucket in the key file is checked, then each spill record in the linked list of spill records is checked, until the key is found or there are no more records. As almost all buckets have no spill records, the average fetch requires one I/O (not including reading the value).

One complication in the scheme is when a split occurs on a bucket that has one or more spill records. In this case, both the bucket being split and the new bucket may overflow. This is handled by performing the spill algorithm for each overflow that occurs. The new buckets may have one or more spill records each, depending on the number of keys that were originally present.

Because the data file is immutable, a bucket's original spill records are no longer referenced after the bucket is split. These blocks of data in the data file are unrecoverable wasted space. Correctly configured databases can have a typical waste factor of 1%, which is acceptable. These unused bytes can be removed by visiting each value in the value file using an off-line process and inserting it into a new database, then delete the old database and use the new one instead.

Recovery

To provide atomicity and consistency, a log file associated with the database stores information used to roll back partial commits.

Iteration

Each record in the data file is prefixed with a header identifying whether it is a value record or a spill record, along with the size of the record in bytes and a copy of the key if it's a value record, so values can be iterated by incrementing a byte counter. A key file can be regenerated from just the data file by iterating the values and performing the key insertion algorithm.

Concurrency

Locks are never held during disk reads and writes. Fetches are fully concurrent, while inserts are serialized. Inserts fail on duplicate keys, and are atomic: they either succeed immediately or fail. After an insert, the key is immediately visible to subsequent fetches.

Formats

All integer values are stored as big endian. The uint48_t format consists of 6 bytes.

Key File

The Key File contains the Header followed by one or more fixed-length Bucket Records.

Header (104 bytes)

char[8]         Type            The characters "nudb.key"
uint16          Version         Holds the version number
uint64          UID             Unique ID generated on creation
uint64          Appnum          Application defined constant
uint16          KeySize         Key size in bytes

uint64          Salt            A random seed
uint64          Pepper          The salt hashed
uint16          BlockSize       Size of a file block in bytes

uint16          LoadFactor      Target fraction in 65536ths

uint8[56]       Reserved        Zeroes
uint8[]         Reserved        Zero-pad to block size

Type identifies the file as belonging to nudb. UID is generated randomly when the database is created, and this value is stored in the data and log files as well - it's used to determine if files belong to the same database. Salt is generated when the database is created and helps prevent complexity attacks; it is prepended to the key material when computing a hash, or used to initialize the state of the hash function. Appnum is an application defined constant set when the database is created. It can be used for anything, for example to distinguish between different data formats.

Pepper is computed by hashing Salt using a hash function seeded with the salt. This is used to fingerprint the hash function used. If a database is opened and the fingerprint does not match the hash calculation performed using the template argument provided when constructing the store, an exception is thrown.

The header for the key file contains the File Header followed by the information above. The Capacity is the number of keys per bucket, and defines the size of a bucket record. The load factor is the target fraction of bucket occupancy.

None of the information in the key file header or the data file header may be changed after the database is created, including the Appnum.

Bucket Record (fixed-length)

uint16              Count           Number of keys in this bucket
uint48              Spill           Offset of the next spill record or 0
BucketEntry[]       Entries         The bucket entries

Bucket Entry

uint48              Offset          Offset in data file of the data
uint48              Size            The size of the value in bytes
uint48              Hash            The hash of the key

Data File

The Data File contains the Header followed by zero or more variable-length Value Records and Spill Records.

Header (92 bytes)

char[8]             Type            The characters "nudb.dat"
uint16              Version         Holds the version number
uint64              UID             Unique ID generated on creation
uint64              Appnum          Application defined constant
uint16              KeySize         Key size in bytes
uint8[64]           (reserved)      Zeroes

UID contains the same value as the salt in the corresponding key file. This is placed in the data file so that key and value files belonging to the same database can be identified.

Data Record (variable-length)

uint48              Size            Size of the value in bytes
uint8[KeySize]      Key             The key.
uint8[Size]         Data            The value data.

Spill Record (fixed-length)

uint48              Zero            All zero, identifies a spill record
uint16              Size            Bytes in spill bucket (for skipping)
Bucket              SpillBucket     Bucket Record

Log File

The Log file contains the Header followed by zero or more fixed size log records. Each log record contains a snapshot of a bucket. When a database is not closed cleanly, the recovery process applies the log records to the key file, overwriting data that may be only partially updated with known good information. After the log records are applied, the data and key files are truncated to the last known good size.

Header (62 bytes)

char[8]             Type            The characters "nudb.log"
uint16              Version         Holds the version number
uint64              UID             Unique ID generated on creation
uint64              Appnum          Application defined constant
uint16              KeySize         Key size in bytes

uint64              Salt            A random seed.
uint64              Pepper          The salt hashed
uint16              BlockSize       Size of a file block in bytes

uint64              KeyFileSize     Size of key file.
uint64              DataFileSize    Size of data file.

Log Record

uint64_t            Index           Bucket index (0-based)
Bucket              Bucket          Compact Bucket record

Compact buckets include only Size entries. These are primarily used to minimize the volume of writes to the log file.

License

Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

Contact

Please report issues or questions here: https://github.com/vinniefalco/NuDB/issues

nudb's People

Contributors

Stargazers

Watchers

nudb's Issues

Add File template type for verify and visit

Or make new functions verify_ex and visit_ex which allow for the template argument.

Fix varint encoding

varint has a bug where its not quite bit compatible with Google's varint.

gcc: internal compiler error

RocksDB says they need gcc 4.8 but the .travis.yml is using gcc 4.6. The build scripts need to be updated to use gcc 4.8.

This could provide inspiration:
https://github.com/eggs-cpp/variant/blob/master/.travis.yml

fetch callback needs documented requirements

The documentation needs to be explicit about the restrictions and limitations of what can be performed in the fetch callback:

From
https://www.reddit.com/r/cpp/comments/60px64/nudb_100_released_a_keyvalue_database_for_ssds/dfaj3z7/

db.insert(the_key, ...);
db.fetch(the_key, [&](void const* buffer, std::size_t size) {
    unexpectedly_slow_operation();
    // wow that really took a long time
    //
    // did you know that we're still holding the internal rwlock?
    //
    // it's "never held during disk reads and writes"
    // but our request was fetched from the write queue,
    // which is probed by every reader and writer in order to
    // maintain consistency
    //
    // now guess who is currently waiting to get exclusive access...
    // ...it's the background worker!
    //
    // Q: are other readers still able to acquire the rwlock?
    // A: of course not, we don't want writer starvation :)
    //
    // TL;DR: a reader can block the whole system forever ... :D
    db.fetch(some_other_key, [&](...) { /* let's deadlock */ });                
}, ...);

any info on tps for NuDB?

Hello,

I am new to NuDB and am research super high-performance key-value db for a blockchain project.

I would like to get some information on the TPS speeds that NuDB is capable of handling, if possible.

Thanks and have a great day

Document serialized inserts in readme

Just a heads-up, since there's still a "TODO" in there in an awkward location ("Insertions are serialized, which means [TODO]") and it might be easier to track a github issue. :)

Final review of error codes

Need to take stock of all detected errors in the verify and recover functions, make sure they are distinguished and appropriate. Also look into open and create, see if we should have more detailed error codes

git submodule update failed

─➤ git submodule update
error: Server does not allow request for unadvertised object 335dbf9c3613e997ed56d540cc8c5ff2e28cab2d
Fetched in submodule path 'doc/docca', but it did not contain 335dbf9c3613e997ed56d540cc8c5ff2e28cab2d. Direct fetching of that commit failed.

Promote NuDB!

After launch, NuDB needs to be promoted on social media and places like StackOverflow, programming forums, etc

Need move members

No reason not to have at least MoveConstructible, if not also MoveAssignable

Rethink this Codec template argument

Many of the API routines require that the Codec type is default constructed, they don't offer a way to copy construct an existing codec, making stateful codecs impossible. But do we even need this template argument? It complicates tooling and should be revisited.

Enforce integer limits

insert should require value sizes < 2^32, and not allow then number of buckets to exceed 2^32-1

Remove success from error enums

Asio doesn't explicitly provide success=0 in its enums

Add 'Fetch' command to nudb tool

Might be useful for external tools/scripts (e.g. inserting data, retreiving data and asserting that it is the same).

Use BOOST_ASSERT instead of assert

Per the Boost library guidelines

consolidate verify() functions

The verify interface should be consolidated into one function that takes bufferSize and progress, and intelligently calls either the slow or the fast algorithm depending on the buffer size and whether or not it would result in a speedup.

Open details to the public API

In order that authors be able to write tools for nudb, details such as the file format types (e.g. detail::dat_file_header) and the functions to interact with it need to be part of the public interface.

README.md docs out of date

The docs refer to AllocSize which has been removed

The README.md could also mention its been running in rippled for over 2 years on production servers.

Make nudb work in 32-bit architectures

Every line of code needs to be inspected and verified for 32/64 bit correctness.

Clean out bin and bin64

Can't have .gitignore and README.md in there

Versioning namespace

Need inline namespace so that breaking changes can be versioned. Or an alternative to inline namespaces that allows the choice of version to be selected at runtime.

constexpr all the things

e.g. clamp

Add extras/

nudb could use the extras/nudb folder, containing useful items like the progress functor which we don't want to bake into the public interface but still make available with the understanding that they might change.

Data file with 0 buckets (header only) just returns "invalid bucket count"

According to the README "The Data File contains the Header followed by zero or more variable-length Value Records and Spill Records."

However when I manually build a header file, run rekey (with a count of 0) and then verify, verify doesn't like the result at all.

Can NuDB work on HDD?

I am using a very old 64-bit computer and I want to try NuDB. I only have a HDD. Can I use it?

typo

README.md reads "licence"

Use error_code along side exceptions

nudb should use and offer boost::system::error_codeinterfaces for all functions natively. The existing APIs which throw exceptions should just call the versions which return error codes and rethrow the error as boost::system::system_error (like Asio does).

An open question is how a caller can determine, in cross-platform fashion, the outcome of certain file operations. For example, that the file did not exist.

Fix nkeys array in verify_fast

verify_fast should not allocate a histogram for all buckets, just chunkSize buckets.

'split' command

Need a command to split a database into two. This could work conceptually by iterating the data file and writing a new data/key file until reaching the halfway point in terms of file size, then continue the iteration and write the second data/key file. In practice to make this work in reasonable time frames (days instead of weeks) it would render large portions of the key file in memory the way that rekey works.

Also, could split N ways where N is a parameter.

Could use mingw support

From here:
https://github.com/vinniefalco/Beast/pull/86

Add 'insert' command to nudb tool

This can be used to write tests in a scripting language (e.g. bash).

Is single-threaded operation deterministic?

As far as I understand the design, if there are a lot of writes happening, buckets might overflow and log files might be written until the pressure goes down a bit.

However, if only one single thread writes to a database, with all elements in a certain order at insertion time is there a guarantee that the database file(s) will be bitwise identical (if the same salt/pepper is used) or might there be other effects in play that might cause indeterministic behaviour?

Calculate 32-bit insert/s limit

Can 32-bit reach > 4GB/s insert? If so then code needs to be audited for 32-bit limit correctness (i.e. when std::size_t is 32 bit).

Boost library conformance

http://www.boost.org/development/requirements.html

Publish a .vcpkg

It would be great to add this to Microsoft's package system for VC

Computation of pepper should be explicitly little-endian

Due to an oversight, the calculation of pepper uses the native integer representation. Since all databases in existence have been created on little endian machines, the code to compute the pepper should always use a little-endian representation (to avoid making a new file format). The documentation should be updated to reflect this reality

How to calculate the pepper (and other hash values)?

rippled seems to use xxhash64 as hashing function.

I checked the key file from my installation and don't seem to be able to reproduce the correct hash in python using https://github.com/ifduyue/python-xxhash.

import xxhash

salt = int("1234abc", 16)  # <-- of course I enter the actual salt here
realpepper = int("deadbeef", 16)  # <-- same for the pepper here

pepper = xxhash.xxh64(salt.to_bytes(8, byteorder="big"), salt).intdigest()
print(pepper = xxhash.xxh64(salt.to_bytes(8, byteorder="big"), salt).hexdigest())

assert (pepper == realpepper)

As far as I understand C++ templates, https://github.com/vinniefalco/nudb/blob/master/include/nudb/detail/format.hpp#L148 should seed the hasher on line 152 with the salt and then calculate the hash of the salt on line 153 (though I'm not so sure why you need the address of the salt?!).

Sporadic Fetch Failures (Multi-Process Support ?)

Greetings! I've been exploring & diving into this library into the context of reading data directly from a live rippled nodestore. Periodically a fetch operation will sporadically fail with the 'key_not_found' error. Subsequently trying to fetch the same key using a new database connection succeeds. I'm 100% sure the values exist in the DB before starting this process.

After exploring the code my primary thought as to the cause of this is a race condition, where the rippled process is flushing data to the nodestore file just as I'm trying to read the bucket containing the data. This could result in a lookup failure, though admittedly I haven't been able to pinpoint this as being the exact cause. Subsequent lookups of the same key using the same database connection fail until it is closed / reopened, leading me to believe internal in-memory cache buckets are getting out of sync (or similar). Does this sound like it could be the cause of this issue or if not any idea on what could be the problem?

If this is the issue, care would need to be taken to support multi-process access for this to be resolved. Sqlite3 employs on-disk locking solutions to facilitate this, but even there that is less than ideal. Would there be any interest in adding something similar to this library? For the time being, we can live with closing/reopening the db on failure, but it'd be nice to utilize something more robust!

Appreciate any insights.

Create key file up front during rekey

To "pre-flight" space issues, rekey should create the key file at the final size in advance. The command line tool should document that if the rekey operation fails, the key file should be deleted.

Clean out bin and bin64

Can't have .gitignore and README.md in there

Ruby Port

Can't think of anywhere else better to put this so here it is. We just implemented the Ruby bindings to this library. It can be found here: https://github.com/DevNullProd/RuDB

Remove error::success

Its not needed and just confuses users.

Spurious excessive write latency

I've been analyzing some massive outliers in write latency in rippled's use of NuDB and tracked the problem down to two things that appear to be deficiencies in NuDB. I have a quick and dirty patch that makes the problem go away, but it's not yet suitable for including in NuDB. I'm opening this issue for any input on whether what I'm thinking makes sense, whether it's worth improving my changes for submission, or whether there's a better way to resolve my problem.

The first issue comes from this code in basic_store/insert:

     auto const rate = static_cast<std::size_t>(
         std::ceil(work / elapsed.count()));
     auto const sleep =
         s_->rate && rate > s_->rate;

This code measures the rate at which data is being written to the store since the last flush and imposes delays on threads that write if the rate exceeds the measured write rate. The problem is that this is extremely inaccurate for the first few writes after a flush. During the flush, a mutex is held. When it's released, backed up writes flow to the store. The first write will almost always exceed the sustained write rate, so that thread is almost always made to sleep.

My proposed quick and dirty fix for this is to set a minimum amount of write data that must be backed up before we impose a sleep on the writer.

The second issue comes from this code in basic_store/flush:

            auto const now = clock_type::now();
            auto const elapsed = duration_cast<duration<float>>(
                now > s_->when ? now - s_->when : clock_type::duration{1});
            s_->rate = static_cast<std::size_t>(
                std::ceil(work / elapsed.count()));

This works well if the batch size happens to be large. But if we happen to have a very small batch, the sustained write rate will be severely underestimated due to fixed overhead. For example, writing 100 objects will not take 100 times what writing 1 object takes, it will take much less than that. So small batches result in a severely underestimated sustained write rate, resulting in the next batch being small and long recovery times.

My proposed quick and dirty fix for this is to allow the estimated write rate to increase quickly but force it to decay slowly.

The net effect of these two changes is a massive decrease in latency spikes (on writes) coming from NuDB's write throttling code.

Here is the quick and dirty patch:

diff --git a/include/nudb/impl/basic_store.ipp b/include/nudb/impl/basic_store.ipp
index 4ac79e4..17d39ed 100644
--- a/include/nudb/impl/basic_store.ipp
+++ b/include/nudb/impl/basic_store.ipp
@@ -342,12 +342,13 @@ insert(
     auto const rate = static_cast<std::size_t>(
         std::ceil(work / elapsed.count()));
     auto const sleep =
-        s_->rate && rate > s_->rate;
+        s_->rate && rate > s_->rate && work > 32*1024*1024;
     m.unlock();
 
     // The caller of insert must be blocked when the rate of insertion
     // (measured in approximate bytes per second) exceeds the maximum rate
-    // that can be flushed. The precise sleep duration is not important.
+    // that can be flushed and lots of data is already cached.
+    // The precise sleep duration is not important.
     if(sleep)
         std::this_thread::sleep_for(milliseconds{25});
 }
@@ -768,6 +769,14 @@ flush()
             auto const now = clock_type::now();
             auto const elapsed = duration_cast<duration<float>>(
                 now > s_->when ? now - s_->when : clock_type::duration{1});
+            auto const rate = std::ceil(work / elapsed.count());
+
+            // raise the rate quickly, drop it slowly
+            if (s_->rate <= rate)
+                s_->rate = rate;
+            else
+                s_->rate = (s_->rate * 16 - s_->rate + rate) / 16;
+
             s_->rate = static_cast<std::size_t>(
                 std::ceil(work / elapsed.count()));
         #if NUDB_DEBUG_LOG

Should I try to finalize these for inclusion? Does it sound like I've correctly understood the cause of the latency spikes? Are there better solutions?

Thanks in advance.