GithubHelp home page GithubHelp logo

meilisearch / heed Goto Github PK

View Code? Open in Web Editor NEW
473.0 15.0 50.0 3.3 MB

A fully typed LMDB wrapper with minimum overhead 🐦

Home Page: https://docs.rs/heed

License: MIT License

Rust 100.00%
key-value-store memory-mapping lmdb wrapper typed

heed's Introduction

heed

License Crates.io Docs dependency status Build

A Rust-centric LMDB abstraction with minimal overhead.

heed enables the storage of various Rust types within LMDB, extending support to include Serde-compatible types.

For usage examples, see heed/examples/.

Building from Source

You can use this command to clone the repository:

git clone --recursive https://github.com/meilisearch/heed.git
cd heed
cargo build

However, if you already cloned it and forgot to initialize the submodules, execute the following command:

git submodule update --init

heed's People

Contributors

aalekhpatel07 avatar arthurprs avatar aureliadolo avatar curquiza avatar darnuria avatar hinto-janai avatar kerollmops avatar lquerel avatar manythefish avatar marinpostma avatar parasyte avatar quake avatar seadve avatar sphw avatar wackbyte avatar xiaoyawei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

heed's Issues

OS Error 22: invalid argument when opening transactions on threads

Bug description

When running the following example using heed, we get unwraps of error 22: invalid argument:

use heed::EnvOpenOptions;

fn main() {
    const NBR_THREADS: usize = 11;
    const NBR_DB: u32 = 100;

    let mut handles = vec![];
    for _i in 0..NBR_THREADS {
        let h = std::thread::spawn(|| {
            let dir = tempfile::tempdir_in(".").unwrap();

            let mut options = EnvOpenOptions::new();
            options.max_dbs(NBR_DB);

            let env = options.open(dir.path()).unwrap();
            for i in 0..NBR_DB {
                env.create_poly_database(Some(&format!("db{i}"))).unwrap();
            }
        });
        handles.push(h);
    }
    for h in handles {
        h.join().unwrap();
    }
    println!("ok!");
}

(see the associated repository for more information)

Raw lmdb reproducer

The issue can be further minimized in C, directly using the master branch (not master3) of lmdb instead of heed, with the following:

#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <pthread.h>
#include <unistd.h>
#include "../lmdb.h"
#include "../midl.h"

#define NBR_THREADS 20
#define NBR_DB 2

void* run(void* param) {
    char* dir_name = (char*) param;
    printf("Starting %s\n", dir_name);
    MDB_env* env;
    mdb_env_create(&env);
    mdb_env_set_maxdbs(env, NBR_DB);
    if (mdb_env_open(env, dir_name, MDB_NOTLS, 0600) != 0) {
        printf("ERROR opening env\n");
        goto exit;
    }
    int parent_txn_res;

    for (int i=0; i<NBR_DB;++i) {
        char* db_name = malloc(100);
        sprintf(db_name, "db_%i", i);

        MDB_txn* txn;

        if (mdb_txn_begin(env, NULL, 0, &txn) != 0) {
            printf("ERROR opening nested txn\n");
            printf("[%s]ERROR opening parent_txn, %d\n", dir_name, parent_txn_res);
            fprintf(stderr, "errno code: %d ", errno);
            perror("Cause");
            goto exit_loop;
        }

        MDB_dbi db;
        sleep(1);
        mdb_txn_commit(txn);
        free(db_name);
        continue;
exit_loop:
        free(db_name);
        goto exit;
    }
    printf("ok env\n");
exit:
    free(dir_name);
    mdb_env_close(env);

    return NULL;
}

int main(int argc, char** argv) {
    pthread_t threads[NBR_THREADS];
    for (int i = 0; i < NBR_THREADS; ++i) {
        char* dir_name = malloc(100);
        sprintf(dir_name, "tmp_env_%i", i);
        pthread_create(&threads[i], NULL, run, dir_name);
    }
    
    for (int i = 0; i < NBR_THREADS; ++i) {
        void* retval;
        pthread_join(threads[i], &retval);
    }
    printf("ok!\n");
    return 0;
}

(see the associated repository for more information)

Likely related to meilisearch/meilisearch#3017

Refactor error type to use `thiserror`

Currently, when trying to convert or use heed for ?-error purposes, i get the following;

error[E0277]: `(dyn std::error::Error + 'static)` cannot be shared between threads safely
  --> tools\migrate\src\main.rs:17:68
   |
17 |             "heed" => Self::Heed(HeedDB::new(db::heed::new_db(path)?)),
   |                                                                    ^ `(dyn std::error::Error + 'static)` cannot be shared between threads safely
   |
   = help: the trait `Sync` is not implemented for `(dyn std::error::Error + 'static)`
   = note: required because of the requirements on the impl of `Sync` for `Unique<(dyn std::error::Error + 'static)>`
   = note: required because it appears within the type `Box<(dyn std::error::Error + 'static)>`
   = note: required because it appears within the type `heed::Error`
   = note: required because of the requirements on the impl of `From<heed::Error>` for `anyhow::Error`
   = note: required by `from`

The error type is a composite enum, which is fine, but it argues it cannot be sent between threads safely because the trait isn't marked, or valid.

I could also ask to wrangle the autotrait in, but i think the error type will also benefit from a bit more readability and maintainance if it's implemented with thiserror, for example like here

Find a real name for this library

I thought about a shorter name for this library, just remember that it is a typed key-value store based on LMDB which itself means Lightning Memory-Mapped Database.
So here is a list of names I thought of:

  • zlmdb
  • whirlwind
  • discern
  • heed
  • agile
  • deft

Change the lifetimes of the `Iter` types

A great idea from @Diggsey to fix #108, an issue where we were allowing the user to keep reference from inside the database while modifying it at the same time.

FWIW, you don't need to make the heed functions unsafe - you just need to make sure that all references returned from the database has a lifetime tied to the database, and that all database-modifying functions require a mutable borrow of the database in order to operate.

It could be a great solution to indeed have differences between the immutable iterators i.e. RoIter, RoPrefixIter, and the mutable ones i.e. RwIter, RwPrefixIter. Where the only differences would be with the lifetimes of the key and values returned, the read-only version would simply return entries with a lifetime of the initial transaction, while the read-write one would return entries with a lifetime that comes from the database itself and takes a new parameter a mutable reference of the database, this way we make sure that we can't keep values while also modifying the database.

// for the read-only iterator, nothing change.
fn Database::iter<'txn, T>(&self, txn: &'txn RoTxn<T>) -> Result<RoIter<'txn, KC, DC>>;

// but for the read-write iterator, we introduce a new lifetime.
fn Database::iter_mut<'txn, 'db, T>(&'db self, txn: &'txn mut RwTxn<T>) -> Result<RwIter<'txn, 'db, KC, DC>>;

// and we also change the del/put_current, and append methods,
// we now ask for a mutable reference of the database.
fn RwIter::put_current(&mut self, &mut db, key: &KC::EItem, data: &DC::EItem) -> Result<bool>;

// this is because the <RwIter as Iterator>::next method now
// returns entries that are tied to the database itself.
impl<'txn, 'db, KC, DC> Iterator for RwIter<'txn, 'db, KC, DC>
where
    KC: BytesDecode<'db>,
    DC: BytesDecode<'db>,
{
    type Item = Result<(KC::DItem, DC::DItem)>;
    fn next(&mut self) -> Option<Self::Item>;
}

I am not sure it will work as the initial Database::iter_mut method asks for a &Database and the RwIter::put_current can only be used with the &mut Database parameter, I am not sure that Rust will accept that. Will try when I got time.

mdbx: Unable to compile on windows

Hi, thank you for making this crate. It's a pleasure to with this crate.

But unfortunately, I am unable to compile my program on windows when using the mdbx backend. I am getting the following error.
Also, when I compile with the default features i.e lmdb It compiles just fine.

  • Error
error: linking with `link.exe` failed: exit code: 1120
  |
  = note: "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Enterprise\\VC\\Tools\\MSVC\\14.29.30037\\bin\\HostX64\\x64\\link.exe" "/NOLOGO" "/NXCOMPAT" "/LIBPATH:C:\\Rust\\.rustup\\toolchains\\stable-x86_64-pc-windows-msvc\\lib\\rustlib\\x86_64-pc-windows-msvc\\lib" "D:\\a\\ky\\ky\\target\\x86_64-pc-windows-msvc\\release\\deps\\ky.ky.56mjjk9i-cgu.0.rcgu.o" "/OUT:D:\\a\\ky\\ky\\target\\x86_64-pc-windows-msvc\\release\\deps\\ky.exe" "/OPT:REF,ICF" "/DEBUG" "/NATVIS:C:\\Rust\\.rustup\\toolchains\\stable-x86_64-pc-windows-msvc\\lib\\rustlib\\etc\\intrinsic.natvis" "/NATVIS:C:\\Rust\\.rustup\\toolchains\\stable-x86_64-pc-windows-msvc\\lib\\rustlib\\etc\\liballoc.natvis" "/NATVIS:C:\\Rust\\.rustup\\toolchains\\stable-x86_64-pc-windows-msvc\\lib\\rustlib\\etc\\libcore.natvis" "/NATVIS:C:\\Rust\\.rustup\\toolchains\\stable-x86_64-pc-windows-msvc\\lib\\rustlib\\etc\\libstd.natvis" "/LIBPATH:D:\\a\\ky\\ky\\target\\x86_64-pc-windows-msvc\\release\\deps" "/LIBPATH:D:\\a\\ky\\ky\\target\\release\\deps" "/LIBPATH:D:\\a\\ky\\ky\\target\\x86_64-pc-windows-msvc\\release\\build\\mdbx-sys-2c7e90a2ee211e79\\out" "/LIBPATH:D:\\a\\ky\\ky\\target\\x86_64-pc-windows-msvc\\release\\build\\zstd-sys-8deae39dfcf40489\\out" "/LIBPATH:C:\\Rust\\.rustup\\toolchains\\stable-x86_64-pc-windows-msvc\\lib\\rustlib\\x86_64-pc-windows-msvc\\lib" "C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\rustcaoipm6\\libzstd_sys-c946d6399ca63c58.rlib" "C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\rustcaoipm6\\libmdbx_sys-5fd7a177ed6819e6.rlib" "C:\\Rust\\.rustup\\toolchains\\stable-x86_64-pc-windows-msvc\\lib\\rustlib\\x86_64-pc-windows-msvc\\lib\\libcompiler_builtins-c115f0a110b00510.rlib" "bcrypt.lib" "advapi32.lib" "cfgmgr32.lib" "gdi32.lib" "kernel32.lib" "msimg32.lib" "ole32.lib" "opengl32.lib" "shell32.lib" "user32.lib" "winspool.lib" "advapi32.lib" "ws2_32.lib" "userenv.lib" "libcmt.lib"
  = note:    Creating library D:\a\ky\ky\target\x86_64-pc-windows-msvc\release\deps\ky.lib and object D:\a\ky\ky\target\x86_64-pc-windows-msvc\release\deps\ky.exp

          libmdbx_sys-5fd7a177ed6819e6.rlib(mdbx.o) : error LNK2019: unresolved external symbol NtClose referenced in function mdbx_mmap

          libmdbx_sys-5fd7a177ed6819e6.rlib(mdbx.o) : error LNK2019: unresolved external symbol NtQuerySystemInformation referenced in function mdbx_osal_bootid

          libmdbx_sys-5fd7a177ed6819e6.rlib(mdbx.o) : error LNK2019: unresolved external symbol NtCreateSection referenced in function mdbx_mmap

          libmdbx_sys-5fd7a177ed6819e6.rlib(mdbx.o) : error LNK2019: unresolved external symbol NtMapViewOfSection referenced in function mdbx_mmap

          libmdbx_sys-5fd7a177ed6819e6.rlib(mdbx.o) : error LNK2019: unresolved external symbol NtUnmapViewOfSection referenced in function mdbx_mresize

          libmdbx_sys-5fd7a177ed6819e6.rlib(mdbx.o) : error LNK2019: unresolved external symbol NtAllocateVirtualMemory referenced in function mdbx_mresize

          libmdbx_sys-5fd7a177ed6819e6.rlib(mdbx.o) : error LNK2019: unresolved external symbol NtFreeVirtualMemory referenced in function mdbx_mresize

          D:\a\ky\ky\target\x86_64-pc-windows-msvc\release\deps\ky.exe : fatal error LNK1120: 7 unresolved externals

          

error: aborting due to previous error

error: could not compile `ky`

To learn more, run the command again with --verbose.
Error: The process 'C:\Rust\.cargo\bin\cargo.exe' failed with exit code 101

I am still new to rust. So any help would be appreciated.

Thanks.

Remove the PolyDatabase type

The PolyDatabase was created to allow library users to open an LMDB database without specifying the key and data types, however, I think this type is just a redundant struct that forces me to port every new method and feature on the Database and PolyDatabase types.

We should remove this struct in favor of a simple alias on the Database struct with ByteSlices, making it as simple to use as before and reducing the amount of code to use and maintain. Replacing the PolyDatabase code by for example a new DupDatabase struct that can be able to support duplicate data.

pub type PolyDatabase = Database<ByteSlice, ByteSlice>;

Support duplicate keys

It doesn't appear that the API supports databases with multiple values for a key (the MDB_DUPSORT flag)

Type check the database openings

When you open a database you specified the key and data types, the library doesn't currently ensure that the second and next times you open databases it is of the types are the same that the first time.

We could ensure that the database types are the same at runtime time, for now the following technic can't ensure that types are the same between program runs. Redis does this by asking for type names and store those on disk.

Here is an example of a possible system ensuring the types at runtime only.

Question about environment sharing

Hi,

I'm migrating from mozilla's rkv, since I'm interested in using mdbx and heed also provides all the functionality I need out of the box.

I have a doubt regarding the instantiation of Environments. In rkv there is a Manager interface for operating the database, in order to guarantee that the same environment is not opened twice.

While reviewing this crate, I noticed that you already wrap the environment with an RwLock. So, is it correct to assume that heed will uphold the "single environment per process" guarantee without a need for an ad-hoc manager in my application code? Can I safely open and use heed::Env directly across many async handlers in my application?

I'm fairly new to Rust, so I'm sorry if this is an obvious question.

Thank you!

master with mdbx does not compile

error[E0308]: mismatched types
   --> /home/greg/.local/share/cargo/git/checkouts/heed-e741d25df84a0eb8/755ef1e/heed/src/db/polymorph.rs:219:17
    |
219 |                 txn.txn,
    |                 ^^^^^^^ expected *-ptr, found struct `txn::RoTxn`
    |
    = note: expected raw pointer `*mut mdbx_sys::MDBX_txn`
                    found struct `txn::RoTxn<T>`

Segfault on mdbx with small max_dbs

Repro:

let env = heed::EnvOpenOptions::new().max_dbs(10).open(&config.state_path).unwrap();
let db = env.create_database(Some("foo"));

Increasing max_dbs to 20 seems to work around the issue.

Moving from zerocopy to bytemuck

This library currently uses the zerocopy crate to provide some useful codecs for common types and types which implements the AsBytes/FromBytes traits (i.e. u8, [u8]). I find it complex/impossible to contribute to this crate as there is no repository for this crate only as it has been developed for the new fuchsia OS; sources of this crate are mixed with the source of the OS (in a sub-repository at least).

The bytemuck library seems to be much easier to contribute to; it is hosted on Github, seems to be much more popular than the former (114k downloads by month compared to 15k). It brings a better API, at least for heed, as it can return information on which kind of problem happens when a cast fails, it would simplify some codecs.

I looked into the crate and found that the change, despite breaking, will be easy. I also found out that we can cast slices of different types by using the try_cast_slice function.

I am just not sure how to expose integers as big-endian, ensuring that the global order is preserved. We will maybe need to create codecs for every integer type like what zerocopy has done using byteorder maybe.

Create two different libraries: heed and heedx

As a Reddit user pointed using a feature flag to make heed work with weither LMDB or MDBX is not the best way to do so.
I did this because it was the easiest way to keep the same code base and only tune some functions.

Recently I found a potential alternative to publish two libraries with two different but keep the same code base.

#[cfg(pkg_name = "bonjour-hello")]
fn main() {
    println!("bonjour-hello");
}

#[cfg(pkg_name = "bonjour")]
fn main() {
    println!("bonjour");
}

#[cfg(pkg_name = "hello")]
fn main() {
    println!("hello");
}
bonjour-hello; RUSTFLAGS='--cfg pkg_name="bonjour"' cargo run
   Compiling bonjour-hello v0.1.0 (/Users/clementrenault/Documents/bonjour-hello)
    Finished dev [unoptimized + debuginfo] target(s) in 0.22s
     Running `target/debug/bonjour-hello`
bonjour
bonjour-hello; RUSTFLAGS='--cfg pkg_name="hello"' cargo run
   Compiling bonjour-hello v0.1.0 (/Users/clementrenault/Documents/bonjour-hello)
    Finished dev [unoptimized + debuginfo] target(s) in 0.22s
     Running `target/debug/bonjour-hello`
hello
bonjour-hello; RUSTFLAGS='--cfg pkg_name="bonjour-hello"' cargo run
   Compiling bonjour-hello v0.1.0 (/Users/clementrenault/Documents/bonjour-hello)
    Finished dev [unoptimized + debuginfo] target(s) in 0.25s
     Running `target/debug/bonjour-hello`
bonjour-hello

I am here to mentor anyone who is interrested in helping me fixing this issue, and if someone would like to take a look of where the action takes place, he/she can look at the mdb/mod.rs file, this is, in part, where changes will be make.

Env open with `Flags::MdbNoSubDir` flag requires target file to exists

I'm trying to create a DB like "some_dir/custom.mdb". Where "custom.mdb" is a file, not a directory. I use Flags::MdbNoSubDir to achieve it.
The current implementation requires the target file to exist due to the use of canonicalize, even though underlying mdb_env_open works without creating the file.

It works if I create an empty file manually, but it would be great to not touch the DB file explicitly.

mdbx crashes with SIGBUS in builds with debug-assertions=false

(Latest nightly Rust, latest versions on crates.io, FreeBSD -CURRENT)

create_database crashes when built with debug-assertions = false. This is purely about debug-assertions, i.e. there's no crash if debug-assertions = true is added to the default [profile.release] which has opt-level = 3.

  * frame #0: 0x0000000001e3dc87 unrelentingtech`mdbx_cursors_eot(txn=0x0000621000003d00, merge=1) at mdbx.c:8575:28
    frame #1: 0x0000000001e3c8d2 unrelentingtech`mdbx_txn_commit(txn=0x0000621000003d00) at mdbx.c:10606:5
    frame #2: 0x0000000001e33cd8 unrelentingtech`heed::txn::RoTxn$LT$T$GT$::commit::h728b59f0bea95370(self=<unavailable>) at txn.rs:31:42
    frame #3: 0x0000000001e3434f unrelentingtech`heed::txn::RwTxn$LT$T$GT$::commit::h94ff7090c06fdc54(self=<unavailable>) at txn.rs:97:9
    frame #4: 0x0000000001e2df8d unrelentingtech`heed::env::Env::raw_create_database::h0c6de3631f89b0ab(self=0x00007fff00000000, name=<unavailable>, types=Option<(core::any::TypeId, core::any::TypeId)> @ 0x00007fffffffc260, parent_wtxn=<unavailable>) at env.rs:343:17
    frame #5: 0x00000000019d6ba9 unrelentingtech`heed::env::Env::create_database_with_txn::hd177dd60862eeb69(self=<unavailable>, name=<unavailable>, parent_wtxn=<unavailable>) at env.rs:293:9
    frame #6: 0x00000000019d6494 unrelentingtech`heed::env::Env::create_database::heb910b58d50d434d(self=0x00007fffffffc6a0, name=<unavailable>) at env.rs:278:18
* thread #1, name = 'unrelentingtech', stop reason = signal SIGBUS: hardware error
    frame #0: 0x0000000002d029f6 unrelentingtech`mdbx_cursors_eot(txn=0x0000621000003d00, merge=1) at mdbx.c:8575:28
   8572
   8573   for (i = txn->mt_numdbs; --i >= 0;) {
   8574     for (mc = cursors[i]; mc; mc = next) {
-> 8575       unsigned stage = mc->mc_signature;
   8576       mdbx_ensure(txn->mt_env,
   8577                   stage == MDBX_MC_SIGNATURE || stage == MDBX_MC_WAIT4EOT);
   8578       next = mc->mc_next;
(lldb) fr v
(MDBX_txn *) txn = 0x0000621000003d00
(unsigned int) merge = 1
(MDBX_cursor **) cursors = 0x0000621000004b38
(MDBX_cursor *) mc = 0xbebebebebebebebe
(MDBX_cursor *) next = 0x00007fffffffab60
(MDBX_cursor *) bk = 0x00007fffffffab20
(MDBX_xcursor *) mx = 0x00007fffffffa860
(int) i = 2
(unsigned int) stage = 4294943168
(lldb) p *cursors
(MDBX_cursor *) $0 = 0x0000000000000000
(lldb) p cursors[1]
(MDBX_cursor *) $1 = 0x0000000000000000
(lldb) p cursors[2]
(MDBX_cursor *) $2 = 0xbebebebebebebebe
(lldb) p cursors[3]
(MDBX_cursor *) $3 = 0xbebebebebebebebe
(lldb) p cursors[4]
(MDBX_cursor *) $4 = 0xbebebebebebebebe

Rework heed to be lightweight and simpler to maintain

  • Create a sub-crate for lmdb-sys in this repository, based on the mdb.master.3 branch of LMDB.
  • Move from zerocopy to bytemuck (also related to licensing). #82
  • Be able to use MDB_WRITEMAP with heed to create databases. #116
  • Remove the generic types of the RoTxn and RwTxn. #94
  • #88. This is related to bytemuck. Don't forget to make heed::Error: Send + Sync.
  • Reintroduce a faster Database::len function. #56
  • Close all the issues/PRs that are related to MDBX as we no more support it and write about that in the README. #125 (comment)
  • Delete all of the branches that are not interesting anymore, and move from a master to a main branch.
  • Document the safety issues with the unnamed database. #40
  • We should rework the usage of the OwnedType combined with BEU32 types too!

Running the tests changes the test LMDB env files

When we run the test files, which are LMDB environments are modified. What I find strange is that tests must be reproducible and should not change the on-disk data or at least generate the same content.

cargo test

Fix the Database implementation to address the documentation

The issue with the current heed library is that it is not ensuring either of those two points. Here is the documentation of mdb_dbi_open LMDB function:

  1. The database handle will be private to the current transaction until the transaction is successfully committed. If the transaction is aborted the handle will be closed automatically. After a successful commit the handle will reside in the shared environment, and may be used by other transactions.
  2. This function must not be called from multiple concurrent transactions in the same process. A transaction that uses this function must finish (either commit or abort) before any other transaction in the process may use this function.

Remove the `vendored` feature

We should consider removing the vendored feature from heed as it can bring more issues than help. I would like to remove the possibility of being able to use another version of LMDB than the one provided by heed.

Support for custom key comparison function

LMDB in its great design supports a lot of features, one that is quite interesting though is the fact that it allows users to define their own comparison function. By default, when no custom function is defined, the keys are compared lexically, with shorter keys collating before longer keys.

If we want to support this, a new design must be found. This comparison function must be Rust idiomatic in the sense that it must get decoded values (not the raw bytes) as parameters and return an Ordering struct. It would be preferable if this function doesn't impact the Database type, in the sense that it would be preferable to use an Option<fn> instead of an Option<F> where F: Fn(), this way it would make the Database as easily Clonable/Copyable as before.

Make transaction opening more safe

Heed could ensure that only one write transaction is ever opened on the same thread.

It can create a thread_local atomic counter for write transactions and raise an error (panic or not) when the user try to open a write transaction and another one is already open.

According to the LMDB documentation, there must never be more than one transaction on the same thread at any time. We could ensure that when we call the read/write_txn function and write into a global variable to check that no other transaction is already opened.

A thread can only use one transaction at a time, plus any child transactions. Each transaction belongs to one thread. See below. The MDB_NOTLS flag changes this for read-only transactions.

Make more types Send+Sync

For my program I need Send+Sync on more types because they are used in an async context. Here's an example error for iterators:

*mut lmdb_sys::MDB_cursor cannot be sent between threads safely`

It would also be nice if heed::Error was Send+Sync.

Is that possible or do you not support this?

Make the environments opening safe

We could use the same approch as the rkv::Manager to ensure that an environment located at a given path is opened only once by program.

It is a simple singleton HashMap storing the canonicalized path and its associated environments. When the user want to open one, it uses the Manager which will give the already opened one or will return the already opened one, using an RwLock on the HashMap to ensure no concurrent opening is done.

Think about reintroducing the `MDB_IDL_LOGN` features

The Mozilla team changed the midl.h file and made the MDB_IDL_LOGN define customizable using a Cargo TOML feature. This change was made to reduce the number of free pages when opening a database in read-write mode.

A bunch of questions for this issue to be closed:

  1. Do we care about this parameter for Meilisearch?
  2. Do other companies need this?
  3. Should we reintroduce this patch to the LMDB source code?
  4. Should hyc introduce the #ifndef #endif changes in the official source code?

Support dynamic database types

With the current BytesEncode/Decode traits it is feasible to open databases without ensuring the type for the entire Database living but for each get/put calls.

Doing so would be cool for Database that can store different data types but what about iterators? How would we deal with them?

We can specify that an iterator will always yield key/data of the same types for the Iter living it could be one solution.

Support of safe environment deletion

There is a fondamental API that heed must support for the new MeiliSearch engine to be able to support index deletion, the mdb_env_close.

This function deletes a given environment, after an environment is safely closed it is possible to safely delete it from disk too.

The problem with this function call is that the API user must ensure that no transaction are still alive, but there is no way to ensure that by ourself as the transactions mutexes are handle by LMDB itself, the only way would be to redefine the mutex logic ourselves.

And this why this issue is here, defining the mutex logic in heed and disable the LMDB one (only if a feature is enabled).

I took a look a the best library for synchronization primitives in Rust, parking_lot and the conclusion is that we should use a Mutex for write transactions and a RwLock for read-only transactions.

Thanks to the RwLock we are able to block readers by write-locking it, avoiding readers to acquire read transactions, this will allow heed to ensure that it got exclusive access either for the write and the read part, allowing it to safely delete the environment or any other operation that require exclusively access.

There is one little missing problem: ensuring that the db handle is not used again by the library user. Maybe trying to lock a deleted db should return a specific error, informing the library user that this db handle is invalid.

To enable the user-end locking feature we must first disable the inerrant locking system of LDMB, we can do so by enabling the NO_LOCK flag when opening an environment.

However, I am not sure about the description of the NO_LOCK parameter of the mdb_env_open flag.

For proper operation the caller must enforce single-writer semantics, and must ensure that no readers are using old transactions while a writer is active. The simplest approach is to use an exclusive lock so that no readers may be active at all when a writer begins.

Does it mean that it is not possible to have concurrent readers and writers when the NO_LOCK flag is set?

Is it possible to get range working with slice types?

All the slice types define EItem = [T], since [T] is not Sized, RangeFrom<&[T]> (and others) do not implement RangeBounds.
Is there some workaround that I am missing?

My use case is a simple key like [u8; N] or Vec<u8>, etc.

Trait std::iter::DoubleEndedIterator

Since a LMDB cursor is capable of quickly going to last, and getting the previous entry, it should support the DoubleEndedIterator, allowing last to first iteration of a range.

Make it compile and work on Windows

error[E0433]: failed to resolve: could not find `unix` in `os`
 --> C:\Users\runneradmin\.cargo\registry\src\github.com-1ecc6299db9ec823\heed-0.5.0\src\env.rs:5:14
  |
5 | use std::os::unix::io::AsRawFd;
  |              ^^^^ could not find `unix` in `os`

error[E0599]: no method named `as_raw_fd` found for type `std::fs::File` in the current scope
   --> C:\Users\runneradmin\.cargo\registry\src\github.com-1ecc6299db9ec823\heed-0.5.0\src\env.rs:307:23
    |
307 |         let fd = file.as_raw_fd();
    |                       ^^^^^^^^^ method not found in `std::fs::File`

error: aborting due to 2 previous errors

Document the safety issues with the unnamed database

LMDB has an important restriction on the unnamed database when named ones are opened the names of the named databases are stored as keys in the unnamed one and are immutable.

I faced a big bug that triggered a SIGSEGV when copying all of the entries of one unnamed database into another env, heed tried to write the values associated with the opened named databases multiple times, it triggered SIGSEGV sometimes.

I didn't take the time to reproduce this behavior, but we must, at least, document it!

Support custom encoding/decoding errors

It would be better and easier to debug a program if the BytesEncoding and BytesDecoding traits could return any error type.

To do so we need to modify the Error enum and more specifically the Encoding and Decoding variants to wrap a Box<dyn Error>.

Assertion 'root > 1' failed in mdb_page_search()

Hi there!
I am building a system that uses heed and while running some tests it prints the following and then crashes:

lmdb-sys/lmdb/libraries/lib:6637: Assertion 'root > 1' failed in mdb_page_search()

To me it is unclear why sometimes the error pops up and sometimes it does not. My tests are really simple, is just one thread storing data and reading it. Could someone explain to me what this error means?

Unable to create database when MDB_WRITEMAP is set

To reproduce:

let path = Path::new("target").join("test.mdb");
fs::create_dir_all(&path)?;
let mut env_builder = EnvOpenOptions::new();
unsafe {
    env_builder.flag(Flags::MdbNoSync);
    env_builder.flag(Flags::MdbWriteMap);
}
let env = env_builder.map_size(10 * 1024 * 1024 * 1024).max_dbs(1000).open(path)?;
let db: Database<ByteSlice, Unit> = env.create_database(Some("test"))?;

fails with:

Error: Mdb(BadTxn)

Works fine if env_builder.flag(Flags::MdbWriteMap); is commented out.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.