meilisearch / heed Goto Github PK

View Code? Open in Web Editor NEW

473.0 15.0 50.0 3.3 MB

A fully typed LMDB wrapper with minimum overhead 🐦

Home Page: https://docs.rs/heed

License: MIT License

Rust 100.00%

key-value-store memory-mapping lmdb wrapper typed

heed's Introduction

heed

A Rust-centric LMDB abstraction with minimal overhead.

heed enables the storage of various Rust types within LMDB, extending support to include Serde-compatible types.

For usage examples, see heed/examples/.

Building from Source

You can use this command to clone the repository:

git clone --recursive https://github.com/meilisearch/heed.git
cd heed
cargo build

However, if you already cloned it and forgot to initialize the submodules, execute the following command:

git submodule update --init

heed's People

Contributors

Stargazers

Watchers

heed's Issues

OS Error 22: invalid argument when opening transactions on threads

Bug description

When running the following example using heed, we get unwraps of error 22: invalid argument:

use heed::EnvOpenOptions;

fn main() {
    const NBR_THREADS: usize = 11;
    const NBR_DB: u32 = 100;

    let mut handles = vec![];
    for _i in 0..NBR_THREADS {
        let h = std::thread::spawn(|| {
            let dir = tempfile::tempdir_in(".").unwrap();

            let mut options = EnvOpenOptions::new();
            options.max_dbs(NBR_DB);

            let env = options.open(dir.path()).unwrap();
            for i in 0..NBR_DB {
                env.create_poly_database(Some(&format!("db{i}"))).unwrap();
            }
        });
        handles.push(h);
    }
    for h in handles {
        h.join().unwrap();
    }
    println!("ok!");
}

(see the associated repository for more information)

Raw lmdb reproducer

The issue can be further minimized in C, directly using the master branch (not master3) of lmdb instead of heed, with the following:

#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <pthread.h>
#include <unistd.h>
#include "../lmdb.h"
#include "../midl.h"

#define NBR_THREADS 20
#define NBR_DB 2

void* run(void* param) {
    char* dir_name = (char*) param;
    printf("Starting %s\n", dir_name);
    MDB_env* env;
    mdb_env_create(&env);
    mdb_env_set_maxdbs(env, NBR_DB);
    if (mdb_env_open(env, dir_name, MDB_NOTLS, 0600) != 0) {
        printf("ERROR opening env\n");
        goto exit;
    }
    int parent_txn_res;

    for (int i=0; i<NBR_DB;++i) {
        char* db_name = malloc(100);
        sprintf(db_name, "db_%i", i);

        MDB_txn* txn;

        if (mdb_txn_begin(env, NULL, 0, &txn) != 0) {
            printf("ERROR opening nested txn\n");
            printf("[%s]ERROR opening parent_txn, %d\n", dir_name, parent_txn_res);
            fprintf(stderr, "errno code: %d ", errno);
            perror("Cause");
            goto exit_loop;
        }

        MDB_dbi db;
        sleep(1);
        mdb_txn_commit(txn);
        free(db_name);
        continue;
exit_loop:
        free(db_name);
        goto exit;
    }
    printf("ok env\n");
exit:
    free(dir_name);
    mdb_env_close(env);

    return NULL;
}

int main(int argc, char** argv) {
    pthread_t threads[NBR_THREADS];
    for (int i = 0; i < NBR_THREADS; ++i) {
        char* dir_name = malloc(100);
        sprintf(dir_name, "tmp_env_%i", i);
        pthread_create(&threads[i], NULL, run, dir_name);
    }
    
    for (int i = 0; i < NBR_THREADS; ++i) {
        void* retval;
        pthread_join(threads[i], &retval);
    }
    printf("ok!\n");
    return 0;
}

(see the associated repository for more information)

Likely related to meilisearch/meilisearch#3017

Think about making the `RoTxn` also `Send` when `NO_TLS` is enabled

Make the RoTxn: Send + Sync in NO_TLS mode, if possible, and rename the sync-read-txn feature flag.

Refactor error type to use `thiserror`

Currently, when trying to convert or use heed for ?-error purposes, i get the following;

error[E0277]: `(dyn std::error::Error + 'static)` cannot be shared between threads safely
  --> tools\migrate\src\main.rs:17:68
   |
17 |             "heed" => Self::Heed(HeedDB::new(db::heed::new_db(path)?)),
   |                                                                    ^ `(dyn std::error::Error + 'static)` cannot be shared between threads safely
   |
   = help: the trait `Sync` is not implemented for `(dyn std::error::Error + 'static)`
   = note: required because of the requirements on the impl of `Sync` for `Unique<(dyn std::error::Error + 'static)>`
   = note: required because it appears within the type `Box<(dyn std::error::Error + 'static)>`
   = note: required because it appears within the type `heed::Error`
   = note: required because of the requirements on the impl of `From<heed::Error>` for `anyhow::Error`
   = note: required by `from`

The error type is a composite enum, which is fine, but it argues it cannot be sent between threads safely because the trait isn't marked, or valid.

I could also ask to wrangle the autotrait in, but i think the error type will also benefit from a bit more readability and maintainance if it's implemented with thiserror, for example like here

Support Deserialize<'txn> instead of DeserializeOwned

We should be able to support Deserialize<'txn> when deserializing types from the byte slice returned by LMDB not only DeserializeOwned. This improvement would allow much better performances when deserializing types that do not necessarily need to copy the data (e.g. big json of &str).

https://serde.rs/lifetimes.html#the-deserializede-lifetime

Find a real name for this library

I thought about a shorter name for this library, just remember that it is a typed key-value store based on LMDB which itself means Lightning Memory-Mapped Database.
So here is a list of names I thought of:

zlmdb
whirlwind
discern
heed
agile
deft

Is it possible to avoid nested transactions for create_database?

https://github.com/Kerollmops/heed/blob/b235e9c3e9984737c967b5de1014b48f125dc28b/heed/src/env.rs#L347

Currently I can't use
env_builder.flag(heed::flags::Flags::MdbWriteMap);
because that flag doesn't work with nested transactions and create_database uses nested transactions internally

Make the typed Database wrap a DynDatabase

This way we can reduce the amount of code.

Clearing a database and writing to it doesn't work

There is a potential strange behavior in LMDB where when you clear a database and write entries into it in the same transaction the database is empty after you commit.

Add bors to the repository

Adding bors would ease the job of the reviewers.

Change the lifetimes of the `Iter` types

A great idea from @Diggsey to fix #108, an issue where we were allowing the user to keep reference from inside the database while modifying it at the same time.

FWIW, you don't need to make the heed functions unsafe - you just need to make sure that all references returned from the database has a lifetime tied to the database, and that all database-modifying functions require a mutable borrow of the database in order to operate.

It could be a great solution to indeed have differences between the immutable iterators i.e. RoIter, RoPrefixIter, and the mutable ones i.e. RwIter, RwPrefixIter. Where the only differences would be with the lifetimes of the key and values returned, the read-only version would simply return entries with a lifetime of the initial transaction, while the read-write one would return entries with a lifetime that comes from the database itself and takes a new parameter a mutable reference of the database, this way we make sure that we can't keep values while also modifying the database.

// for the read-only iterator, nothing change.
fn Database::iter<'txn, T>(&self, txn: &'txn RoTxn<T>) -> Result<RoIter<'txn, KC, DC>>;

// but for the read-write iterator, we introduce a new lifetime.
fn Database::iter_mut<'txn, 'db, T>(&'db self, txn: &'txn mut RwTxn<T>) -> Result<RwIter<'txn, 'db, KC, DC>>;

// and we also change the del/put_current, and append methods,
// we now ask for a mutable reference of the database.
fn RwIter::put_current(&mut self, &mut db, key: &KC::EItem, data: &DC::EItem) -> Result<bool>;

// this is because the <RwIter as Iterator>::next method now
// returns entries that are tied to the database itself.
impl<'txn, 'db, KC, DC> Iterator for RwIter<'txn, 'db, KC, DC>
where
    KC: BytesDecode<'db>,
    DC: BytesDecode<'db>,
{
    type Item = Result<(KC::DItem, DC::DItem)>;
    fn next(&mut self) -> Option<Self::Item>;
}

I am not sure it will work as the initial Database::iter_mut method asks for a &Database and the RwIter::put_current can only be used with the &mut Database parameter, I am not sure that Rust will accept that. Will try when I got time.

Make the CI test and run the examples

The current CI doesn't run the example programs. This is an issue as bugs can also be found with them.

mdbx: Unable to compile on windows

Hi, thank you for making this crate. It's a pleasure to with this crate.

But unfortunately, I am unable to compile my program on windows when using the mdbx backend. I am getting the following error.
Also, when I compile with the default features i.e lmdb It compiles just fine.

Error

error: linking with `link.exe` failed: exit code: 1120
  |
  = note: "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Enterprise\\VC\\Tools\\MSVC\\14.29.30037\\bin\\HostX64\\x64\\link.exe" "/NOLOGO" "/NXCOMPAT" "/LIBPATH:C:\\Rust\\.rustup\\toolchains\\stable-x86_64-pc-windows-msvc\\lib\\rustlib\\x86_64-pc-windows-msvc\\lib" "D:\\a\\ky\\ky\\target\\x86_64-pc-windows-msvc\\release\\deps\\ky.ky.56mjjk9i-cgu.0.rcgu.o" "/OUT:D:\\a\\ky\\ky\\target\\x86_64-pc-windows-msvc\\release\\deps\\ky.exe" "/OPT:REF,ICF" "/DEBUG" "/NATVIS:C:\\Rust\\.rustup\\toolchains\\stable-x86_64-pc-windows-msvc\\lib\\rustlib\\etc\\intrinsic.natvis" "/NATVIS:C:\\Rust\\.rustup\\toolchains\\stable-x86_64-pc-windows-msvc\\lib\\rustlib\\etc\\liballoc.natvis" "/NATVIS:C:\\Rust\\.rustup\\toolchains\\stable-x86_64-pc-windows-msvc\\lib\\rustlib\\etc\\libcore.natvis" "/NATVIS:C:\\Rust\\.rustup\\toolchains\\stable-x86_64-pc-windows-msvc\\lib\\rustlib\\etc\\libstd.natvis" "/LIBPATH:D:\\a\\ky\\ky\\target\\x86_64-pc-windows-msvc\\release\\deps" "/LIBPATH:D:\\a\\ky\\ky\\target\\release\\deps" "/LIBPATH:D:\\a\\ky\\ky\\target\\x86_64-pc-windows-msvc\\release\\build\\mdbx-sys-2c7e90a2ee211e79\\out" "/LIBPATH:D:\\a\\ky\\ky\\target\\x86_64-pc-windows-msvc\\release\\build\\zstd-sys-8deae39dfcf40489\\out" "/LIBPATH:C:\\Rust\\.rustup\\toolchains\\stable-x86_64-pc-windows-msvc\\lib\\rustlib\\x86_64-pc-windows-msvc\\lib" "C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\rustcaoipm6\\libzstd_sys-c946d6399ca63c58.rlib" "C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\rustcaoipm6\\libmdbx_sys-5fd7a177ed6819e6.rlib" "C:\\Rust\\.rustup\\toolchains\\stable-x86_64-pc-windows-msvc\\lib\\rustlib\\x86_64-pc-windows-msvc\\lib\\libcompiler_builtins-c115f0a110b00510.rlib" "bcrypt.lib" "advapi32.lib" "cfgmgr32.lib" "gdi32.lib" "kernel32.lib" "msimg32.lib" "ole32.lib" "opengl32.lib" "shell32.lib" "user32.lib" "winspool.lib" "advapi32.lib" "ws2_32.lib" "userenv.lib" "libcmt.lib"
  = note:    Creating library D:\a\ky\ky\target\x86_64-pc-windows-msvc\release\deps\ky.lib and object D:\a\ky\ky\target\x86_64-pc-windows-msvc\release\deps\ky.exp

          libmdbx_sys-5fd7a177ed6819e6.rlib(mdbx.o) : error LNK2019: unresolved external symbol NtClose referenced in function mdbx_mmap

          libmdbx_sys-5fd7a177ed6819e6.rlib(mdbx.o) : error LNK2019: unresolved external symbol NtQuerySystemInformation referenced in function mdbx_osal_bootid

          libmdbx_sys-5fd7a177ed6819e6.rlib(mdbx.o) : error LNK2019: unresolved external symbol NtCreateSection referenced in function mdbx_mmap

          libmdbx_sys-5fd7a177ed6819e6.rlib(mdbx.o) : error LNK2019: unresolved external symbol NtMapViewOfSection referenced in function mdbx_mmap

          libmdbx_sys-5fd7a177ed6819e6.rlib(mdbx.o) : error LNK2019: unresolved external symbol NtUnmapViewOfSection referenced in function mdbx_mresize

          libmdbx_sys-5fd7a177ed6819e6.rlib(mdbx.o) : error LNK2019: unresolved external symbol NtAllocateVirtualMemory referenced in function mdbx_mresize

          libmdbx_sys-5fd7a177ed6819e6.rlib(mdbx.o) : error LNK2019: unresolved external symbol NtFreeVirtualMemory referenced in function mdbx_mresize

          D:\a\ky\ky\target\x86_64-pc-windows-msvc\release\deps\ky.exe : fatal error LNK1120: 7 unresolved externals

          

error: aborting due to previous error

error: could not compile `ky`

To learn more, run the command again with --verbose.
Error: The process 'C:\Rust\.cargo\bin\cargo.exe' failed with exit code 101

Related GHA: https://github.com/numToStr/ky/runs/2802130235
Repo: https://github.com/numToStr/ky

I am still new to rust. So any help would be appreciated.

Thanks.

Remove the PolyDatabase type

The PolyDatabase was created to allow library users to open an LMDB database without specifying the key and data types, however, I think this type is just a redundant struct that forces me to port every new method and feature on the Database and PolyDatabase types.

We should remove this struct in favor of a simple alias on the Database struct with ByteSlices, making it as simple to use as before and reducing the amount of code to use and maintain. Replacing the PolyDatabase code by for example a new DupDatabase struct that can be able to support duplicate data.

pub type PolyDatabase = Database<ByteSlice, ByteSlice>;

Format with merge_imports=true

Currently this is the only way to get stable deterministic import formatting. I also use it by default in VSCode.

Support duplicate keys

It doesn't appear that the API supports databases with multiple values for a key (the MDB_DUPSORT flag)

Type check the database openings

When you open a database you specified the key and data types, the library doesn't currently ensure that the second and next times you open databases it is of the types are the same that the first time.

We could ensure that the database types are the same at runtime time, for now the following technic can't ensure that types are the same between program runs. Redis does this by asking for type names and store those on disk.

Here is an example of a possible system ensuring the types at runtime only.

Question about environment sharing

Hi,

I'm migrating from mozilla's rkv, since I'm interested in using mdbx and heed also provides all the functionality I need out of the box.

I have a doubt regarding the instantiation of Environments. In rkv there is a Manager interface for operating the database, in order to guarantee that the same environment is not opened twice.

While reviewing this crate, I noticed that you already wrap the environment with an RwLock. So, is it correct to assume that heed will uphold the "single environment per process" guarantee without a need for an ad-hoc manager in my application code? Can I safely open and use heed::Env directly across many async handlers in my application?

I'm fairly new to Rust, so I'm sorry if this is an obvious question.

Thank you!

Disallow opening multiple transactions on the same thread

We could return a custom error when we detect that a single thread is trying to open a transaction, but one is already opened. We could use a thread local to do that.

Is it possible to specify LMDB flags such as MDB_NOSYNC?

Looks like flags are not yet supported. Did I miss them?

Thanks for this nice wrapper.

Laurent

Remove the generic types of the RoTxn and RwTxn

We introduce a generic type to the RoTxn and RwTxn structs to make sure that we don't use a transaction created on one environment with another environment, this introduced more methods on the Env struct to create typed transactions.

I am not quite sure it makes sense now that we added a runtime check to make sure the transactions are valid and match the same environment.

master with mdbx does not compile

error[E0308]: mismatched types
   --> /home/greg/.local/share/cargo/git/checkouts/heed-e741d25df84a0eb8/755ef1e/heed/src/db/polymorph.rs:219:17
    |
219 |                 txn.txn,
    |                 ^^^^^^^ expected *-ptr, found struct `txn::RoTxn`
    |
    = note: expected raw pointer `*mut mdbx_sys::MDBX_txn`
                    found struct `txn::RoTxn<T>`

Remapping Types on Iterators

It would be great to have a way to remap the types of the different Iterator heed types. This way it would be easier to, for example, create a prefix_iter based on some bytes and remap the decoding types to be the interresting ones.

Introducing the remap_key_type and remap_value_type would be great too!

https://docs.rs/heed/0.10.1/heed/struct.Database.html#method.remap_types

Segfault on mdbx with small max_dbs

Repro:

let env = heed::EnvOpenOptions::new().max_dbs(10).open(&config.state_path).unwrap();
let db = env.create_database(Some("foo"));

Increasing max_dbs to 20 seems to work around the issue.

Moving from zerocopy to bytemuck

This library currently uses the zerocopy crate to provide some useful codecs for common types and types which implements the AsBytes/FromBytes traits (i.e. u8, [u8]). I find it complex/impossible to contribute to this crate as there is no repository for this crate only as it has been developed for the new fuchsia OS; sources of this crate are mixed with the source of the OS (in a sub-repository at least).

The bytemuck library seems to be much easier to contribute to; it is hosted on Github, seems to be much more popular than the former (114k downloads by month compared to 15k). It brings a better API, at least for heed, as it can return information on which kind of problem happens when a cast fails, it would simplify some codecs.

I looked into the crate and found that the change, despite breaking, will be easy. I also found out that we can cast slices of different types by using the try_cast_slice function.

I am just not sure how to expose integers as big-endian, ensuring that the global order is preserved. We will maybe need to create codecs for every integer type like what zerocopy has done using byteorder maybe.

Create two different libraries: heed and heedx

As a Reddit user pointed using a feature flag to make heed work with weither LMDB or MDBX is not the best way to do so.
I did this because it was the easiest way to keep the same code base and only tune some functions.

Recently I found a potential alternative to publish two libraries with two different but keep the same code base.

#[cfg(pkg_name = "bonjour-hello")]
fn main() {
    println!("bonjour-hello");
}

#[cfg(pkg_name = "bonjour")]
fn main() {
    println!("bonjour");
}

#[cfg(pkg_name = "hello")]
fn main() {
    println!("hello");
}

bonjour-hello; RUSTFLAGS='--cfg pkg_name="bonjour"' cargo run
   Compiling bonjour-hello v0.1.0 (/Users/clementrenault/Documents/bonjour-hello)
    Finished dev [unoptimized + debuginfo] target(s) in 0.22s
     Running `target/debug/bonjour-hello`
bonjour

bonjour-hello; RUSTFLAGS='--cfg pkg_name="hello"' cargo run
   Compiling bonjour-hello v0.1.0 (/Users/clementrenault/Documents/bonjour-hello)
    Finished dev [unoptimized + debuginfo] target(s) in 0.22s
     Running `target/debug/bonjour-hello`
hello

bonjour-hello; RUSTFLAGS='--cfg pkg_name="bonjour-hello"' cargo run
   Compiling bonjour-hello v0.1.0 (/Users/clementrenault/Documents/bonjour-hello)
    Finished dev [unoptimized + debuginfo] target(s) in 0.25s
     Running `target/debug/bonjour-hello`
bonjour-hello

I am here to mentor anyone who is interrested in helping me fixing this issue, and if someone would like to take a look of where the action takes place, he/she can look at the mdb/mod.rs file, this is, in part, where changes will be make.

Env open with `Flags::MdbNoSubDir` flag requires target file to exists

I'm trying to create a DB like "some_dir/custom.mdb". Where "custom.mdb" is a file, not a directory. I use Flags::MdbNoSubDir to achieve it.
The current implementation requires the target file to exist due to the use of canonicalize, even though underlying mdb_env_open works without creating the file.

It works if I create an empty file manually, but it would be great to not touch the DB file explicitly.

mdbx crashes with SIGBUS in builds with debug-assertions=false

(Latest nightly Rust, latest versions on crates.io, FreeBSD -CURRENT)

create_database crashes when built with debug-assertions = false. This is purely about debug-assertions, i.e. there's no crash if debug-assertions = true is added to the default [profile.release] which has opt-level = 3.

  * frame #0: 0x0000000001e3dc87 unrelentingtech`mdbx_cursors_eot(txn=0x0000621000003d00, merge=1) at mdbx.c:8575:28
    frame #1: 0x0000000001e3c8d2 unrelentingtech`mdbx_txn_commit(txn=0x0000621000003d00) at mdbx.c:10606:5
    frame #2: 0x0000000001e33cd8 unrelentingtech`heed::txn::RoTxn$LT$T$GT$::commit::h728b59f0bea95370(self=<unavailable>) at txn.rs:31:42
    frame #3: 0x0000000001e3434f unrelentingtech`heed::txn::RwTxn$LT$T$GT$::commit::h94ff7090c06fdc54(self=<unavailable>) at txn.rs:97:9
    frame #4: 0x0000000001e2df8d unrelentingtech`heed::env::Env::raw_create_database::h0c6de3631f89b0ab(self=0x00007fff00000000, name=<unavailable>, types=Option<(core::any::TypeId, core::any::TypeId)> @ 0x00007fffffffc260, parent_wtxn=<unavailable>) at env.rs:343:17
    frame #5: 0x00000000019d6ba9 unrelentingtech`heed::env::Env::create_database_with_txn::hd177dd60862eeb69(self=<unavailable>, name=<unavailable>, parent_wtxn=<unavailable>) at env.rs:293:9
    frame #6: 0x00000000019d6494 unrelentingtech`heed::env::Env::create_database::heb910b58d50d434d(self=0x00007fffffffc6a0, name=<unavailable>) at env.rs:278:18

* thread #1, name = 'unrelentingtech', stop reason = signal SIGBUS: hardware error
    frame #0: 0x0000000002d029f6 unrelentingtech`mdbx_cursors_eot(txn=0x0000621000003d00, merge=1) at mdbx.c:8575:28
   8572
   8573   for (i = txn->mt_numdbs; --i >= 0;) {
   8574     for (mc = cursors[i]; mc; mc = next) {
-> 8575       unsigned stage = mc->mc_signature;
   8576       mdbx_ensure(txn->mt_env,
   8577                   stage == MDBX_MC_SIGNATURE || stage == MDBX_MC_WAIT4EOT);
   8578       next = mc->mc_next;
(lldb) fr v
(MDBX_txn *) txn = 0x0000621000003d00
(unsigned int) merge = 1
(MDBX_cursor **) cursors = 0x0000621000004b38
(MDBX_cursor *) mc = 0xbebebebebebebebe
(MDBX_cursor *) next = 0x00007fffffffab60
(MDBX_cursor *) bk = 0x00007fffffffab20
(MDBX_xcursor *) mx = 0x00007fffffffa860
(int) i = 2
(unsigned int) stage = 4294943168
(lldb) p *cursors
(MDBX_cursor *) $0 = 0x0000000000000000
(lldb) p cursors[1]
(MDBX_cursor *) $1 = 0x0000000000000000
(lldb) p cursors[2]
(MDBX_cursor *) $2 = 0xbebebebebebebebe
(lldb) p cursors[3]
(MDBX_cursor *) $3 = 0xbebebebebebebebe
(lldb) p cursors[4]
(MDBX_cursor *) $4 = 0xbebebebebebebebe

Improve the poly/database length function

We currently create an iterator count the number of entries to returns to compute the number of entries a database stores.

There is a better way that is not O(n) which is using the num_entries field of the Stat struct that is returned by the mdb_stat function.

Make the LmdbError::Other variant be change into an io::Error

We can extract the LmdbError::Other variant os error code, use the io::Error::from_raw_os_error and return an io::Error to the user.

Rework heed to be lightweight and simpler to maintain

Make the `EnvOpenOption` available.

Currently, we can specify the EnvOpenOption when starting heed, but then we can’t query them.
It would be cool to have a wrapper around mdb_env_info.

Running the tests changes the test LMDB env files

When we run the test files, which are LMDB environments are modified. What I find strange is that tests must be reproducible and should not change the on-disk data or at least generate the same content.

cargo test

Fix the Database implementation to address the documentation

The issue with the current heed library is that it is not ensuring either of those two points. Here is the documentation of mdb_dbi_open LMDB function:

The database handle will be private to the current transaction until the transaction is successfully committed. If the transaction is aborted the handle will be closed automatically. After a successful commit the handle will reside in the shared environment, and may be used by other transactions.
This function must not be called from multiple concurrent transactions in the same process. A transaction that uses this function must finish (either commit or abort) before any other transaction in the process may use this function.

Remove the `vendored` feature

We should consider removing the vendored feature from heed as it can bring more issues than help. I would like to remove the possibility of being able to use another version of LMDB than the one provided by heed.

Support for custom key comparison function

LMDB in its great design supports a lot of features, one that is quite interesting though is the fact that it allows users to define their own comparison function. By default, when no custom function is defined, the keys are compared lexically, with shorter keys collating before longer keys.

If we want to support this, a new design must be found. This comparison function must be Rust idiomatic in the sense that it must get decoded values (not the raw bytes) as parameters and return an Ordering struct. It would be preferable if this function doesn't impact the Database type, in the sense that it would be preferable to use an Option<fn> instead of an Option<F> where F: Fn(), this way it would make the Database as easily Clonable/Copyable as before.

Make transaction opening more safe

Heed could ensure that only one write transaction is ever opened on the same thread.

It can create a thread_local atomic counter for write transactions and raise an error (panic or not) when the user try to open a write transaction and another one is already open.

According to the LMDB documentation, there must never be more than one transaction on the same thread at any time. We could ensure that when we call the read/write_txn function and write into a global variable to check that no other transaction is already opened.

A thread can only use one transaction at a time, plus any child transactions. Each transaction belongs to one thread. See below. The MDB_NOTLS flag changes this for read-only transactions.

Make more types Send+Sync

For my program I need Send+Sync on more types because they are used in an async context. Here's an example error for iterators:

*mut lmdb_sys::MDB_cursor cannot be sent between threads safely`

It would also be nice if heed::Error was Send+Sync.

Is that possible or do you not support this?

Make the environments opening safe

We could use the same approch as the rkv::Manager to ensure that an environment located at a given path is opened only once by program.

It is a simple singleton HashMap storing the canonicalized path and its associated environments. When the user want to open one, it uses the Manager which will give the already opened one or will return the already opened one, using an RwLock on the HashMap to ensure no concurrent opening is done.

Think about reintroducing the `MDB_IDL_LOGN` features

The Mozilla team changed the midl.h file and made the MDB_IDL_LOGN define customizable using a Cargo TOML feature. This change was made to reduce the number of free pages when opening a database in read-write mode.

A bunch of questions for this issue to be closed:

Do we care about this parameter for Meilisearch?
Do other companies need this?
Should we reintroduce this patch to the LMDB source code?
Should hyc introduce the #ifndef #endif changes in the official source code?

Support dynamic database types

With the current BytesEncode/Decode traits it is feasible to open databases without ensuring the type for the entire Database living but for each get/put calls.

Doing so would be cool for Database that can store different data types but what about iterators? How would we deal with them?

We can specify that an iterator will always yield key/data of the same types for the Iter living it could be one solution.

Support of safe environment deletion

There is a fondamental API that heed must support for the new MeiliSearch engine to be able to support index deletion, the mdb_env_close.

This function deletes a given environment, after an environment is safely closed it is possible to safely delete it from disk too.

The problem with this function call is that the API user must ensure that no transaction are still alive, but there is no way to ensure that by ourself as the transactions mutexes are handle by LMDB itself, the only way would be to redefine the mutex logic ourselves.

And this why this issue is here, defining the mutex logic in heed and disable the LMDB one (only if a feature is enabled).

I took a look a the best library for synchronization primitives in Rust, parking_lot and the conclusion is that we should use a Mutex for write transactions and a RwLock for read-only transactions.

Thanks to the RwLock we are able to block readers by write-locking it, avoiding readers to acquire read transactions, this will allow heed to ensure that it got exclusive access either for the write and the read part, allowing it to safely delete the environment or any other operation that require exclusively access.

There is one little missing problem: ensuring that the db handle is not used again by the library user. Maybe trying to lock a deleted db should return a specific error, informing the library user that this db handle is invalid.

To enable the user-end locking feature we must first disable the inerrant locking system of LDMB, we can do so by enabling the NO_LOCK flag when opening an environment.

However, I am not sure about the description of the NO_LOCK parameter of the mdb_env_open flag.

For proper operation the caller must enforce single-writer semantics, and must ensure that no readers are using old transactions while a writer is active. The simplest approach is to use an exclusive lock so that no readers may be active at all when a writer begins.

Does it mean that it is not possible to have concurrent readers and writers when the NO_LOCK flag is set?

Is it possible to get range working with slice types?

All the slice types define EItem = [T], since [T] is not Sized, RangeFrom<&[T]> (and others) do not implement RangeBounds.
Is there some workaround that I am missing?

My use case is a simple key like [u8; N] or Vec<u8>, etc.

Trait std::iter::DoubleEndedIterator

Since a LMDB cursor is capable of quickly going to last, and getting the previous entry, it should support the DoubleEndedIterator, allowing last to first iteration of a range.

Support env_stat

Taken from the python bindings docs for LMDB: https://lmdb.readthedocs.io/en/release/#lmdb.Environment.stat

Would it be possible to expose that function?

Make it compile and work on Windows

error[E0433]: failed to resolve: could not find `unix` in `os`
 --> C:\Users\runneradmin\.cargo\registry\src\github.com-1ecc6299db9ec823\heed-0.5.0\src\env.rs:5:14
  |
5 | use std::os::unix::io::AsRawFd;
  |              ^^^^ could not find `unix` in `os`

error[E0599]: no method named `as_raw_fd` found for type `std::fs::File` in the current scope
   --> C:\Users\runneradmin\.cargo\registry\src\github.com-1ecc6299db9ec823\heed-0.5.0\src\env.rs:307:23
    |
307 |         let fd = file.as_raw_fd();
    |                       ^^^^^^^^^ method not found in `std::fs::File`

error: aborting due to 2 previous errors

Document the safety issues with the unnamed database

LMDB has an important restriction on the unnamed database when named ones are opened the names of the named databases are stored as keys in the unnamed one and are immutable.

I faced a big bug that triggered a SIGSEGV when copying all of the entries of one unnamed database into another env, heed tried to write the values associated with the opened named databases multiple times, it triggered SIGSEGV sometimes.

I didn't take the time to reproduce this behavior, but we must, at least, document it!

Support custom encoding/decoding errors

It would be better and easier to debug a program if the BytesEncoding and BytesDecoding traits could return any error type.

To do so we need to modify the Error enum and more specifically the Encoding and Decoding variants to wrap a Box<dyn Error>.

Assertion 'root > 1' failed in mdb_page_search()

Hi there!
I am building a system that uses heed and while running some tests it prints the following and then crashes:

lmdb-sys/lmdb/libraries/lib:6637: Assertion 'root > 1' failed in mdb_page_search()

To me it is unclear why sometimes the error pops up and sometimes it does not. My tests are really simple, is just one thread storing data and reading it. Could someone explain to me what this error means?

Q: would you be interested in writing rust bindings for libmdbx?

libmdbx is revised and extended descendant of LMDB.

Unable to create database when MDB_WRITEMAP is set

To reproduce:

let path = Path::new("target").join("test.mdb");
fs::create_dir_all(&path)?;
let mut env_builder = EnvOpenOptions::new();
unsafe {
    env_builder.flag(Flags::MdbNoSync);
    env_builder.flag(Flags::MdbWriteMap);
}
let env = env_builder.map_size(10 * 1024 * 1024 * 1024).max_dbs(1000).open(path)?;
let db: Database<ByteSlice, Unit> = env.create_database(Some("test"))?;

fails with:

Error: Mdb(BadTxn)

Works fine if env_builder.flag(Flags::MdbWriteMap); is commented out.

meilisearch / heed Goto Github PK

heed's Introduction

heed

Building from Source

heed's People

Contributors

Stargazers

Watchers

Forkers

heed's Issues

Raw lmdb reproducer

Recommend Projects

Recommend Topics

Recommend Org

Jobs