learnedsystems / rmi Goto Github PK

View Code? Open in Web Editor NEW

213.0 10.0 34.0 1.01 MB

The recursive model index, a learned index structure

License: MIT License

Rust 92.18% Makefile 2.05% C++ 5.24% Python 0.53%

rmi's Issues

Off-by-one-Error in `train_two_layer` implementation

Hi,

I encountered an assert issue when training RMI on very small (100 items) datasets for testing purposes:

thread '<unnamed>' panicked at 'start index was 100 but end index was 100', [...]/RMI/rmi_lib/src/train/two_layer.rs:27:5

Upon closer investigation, I think I have found an off-by-one error in the train_two_layer function implementation. I did not take a look at the context beyond the function, therefore take what I am about to say with a grain of salt:

Link to source file

The value of split_idx is calculated here:

RMI/rmi_lib/src/train/two_layer.rs

Line 132 in 5fdff45

let split_idx = md_container.lower_bound_by(|x| {

split_idx should be in the interval [0, md_container.len())

RMI/rmi_lib/src/train/two_layer.rs

Line 139 in 5fdff45

if split_idx > 0 && split_idx < md_container.len() {

Now lets look at the case where split_idx == md_container.len() - 1, which is valid per [2.]:

The else branch is taken, since split_idx < md_container.len()

RMI/rmi_lib/src/train/two_layer.rs

Line 147 in 5fdff45

let mut leaf_models = if split_idx >= md_container.len() {

split_idx + 1 (== md_container.len()) is passed to build_models_from as start_idx

|| build_models_from(&md_container, &top_model, layer2_model,
                                   split_idx + 1, md_container.len(),
                                   split_idx_target,
                                   second_half_models)

fn build_models_from<T: TrainingKey>(data: &RMITrainingData<T>,
                                    top_model: &Box<dyn Model>,
                                    model_type: &str,
                                    start_idx: usize, end_idx: usize,
                                    first_model_idx: usize,
                                    num_models: usize) -> Vec<Box<dyn Model>>

Link 4.1

Link 4.2

Assert fails, since md_container.len() > md_container.len() is false

assert!(end_idx > start_idx,
        "start index was {} but end index was {}",
        start_idx, end_idx
);

Link 5

An obvious fix would be to change the condition in [3.] to split_idx >= md_container.len() - 1, however I am not entirely certain whether that leads to issues in other contexts. I guess a similar issue would happen if similar_idx == 0, only for the first call. I changed the condition in my local version and re-ran the tests - it seems to work just fine:

Running test cache_fix_osm
Test cache_fix_osm finished.
Running test cache_fix_wiki
Test cache_fix_wiki finished.
Running test max_size_wiki
Test max_size_wiki finished.
Running test radix_model_wiki
Test radix_model_wiki finished.
Running test simple_model_osm
Test simple_model_osm finished.
Running test simple_model_wiki
Test simple_model_wiki finished.
============== TEST RESULTS ===============
python3 report.py
PASS cache_fix_osm
PASS cache_fix_wiki
PASS max_size_wiki
PASS radix_model_wiki
PASS simple_model_osm
PASS simple_model_wiki

I can open a pull request with that fix if you would like.

Optimizer interpretation

Any thoughts on how to interpret the output of the optimizer? I'm seeing a table with entries

Models                          Branch        AvgLg2        MaxLg2       Size (b)

But I haven't found an explanation of what these mean.

Multiple layers

Trying to train with three layers fails:

cargo run --release -- books_200M_uint32 my_first_rmi linear,linear,linear 100
    Finished release [optimized + debuginfo] target(s) in 0.02s
     Running `target/release/rmi /z/downloads/books_200M_uint32 my_first_rmi linear,linear,linear 100`
thread 'main' panicked at 'explicit panic', <::std::macros::panic macros>:2:4
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Is this because of an intentional limitation on the number of layers or some other issue?

Citation

Do you have a preferred citation for this work?

Radix root model not working with small branching factor

Hi, I recently tried to build a model with a radix layer and noticed that radix models do not work with small branching factors (1, 2, 3), e.g.

cargo run --release -- uniform_dense_200M_uint64 sosd_rmi radix,linear 1

I always get an error similar to this:

thread '<unnamed>' panicked at 'Top model gave an index of 2954937499648 which is out of bounds of 2. Subset range: 1 to 200000000', /private/tmp/rust-20200803-28615-1977jkb/rustc-1.45.2-src/src/libstd/macros.rs:16:9

I suspect that the issue stems from models::utils::num_bits. I guess you are computing the amount of bits that are needed to represent the largest index (while loop) and then substract 1 to ensure that the radix model always predicts an index smaller than the branching_factor. However, your implementation appears to be off by 1, i.e. the additional nbits -= 1 is not needed.

NN model support

I noticed that more complex NN models could be used in the paper powered by TensorFlow. But there are only simple models here? How can I build an RMI with a more complex NN model? @RyanMarcus @anikristo

Add option to malloc instead of static alloc arrays

cleanup() should only be called on malloc'ed model parameters

Hi, I just trained a model with two linear_spline layers and a branching_factor of 200,000, i.e.

cargo run --release -- somedata_100M_uint64 sosd_rmi linear_spline,linear_spline 200000

The resulting C++ program does not compile with the following error:

sosd_rmi.cpp:11:5: error: no matching function for call to 'free'
    free(L1_PARAMETERS);
    ^~~~
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/usr/include/malloc/_malloc.h:42:7: note: candidate function not viable: no known conversion from 'const double [400000]' to 'void *' for 1st argument
void     free(void *);
         ^
1 error generated.

It seems like in the cleanup() function, free is called on L1_PARAMETERS although it was not allocated with malloc.

Looking at codegen::generate_code() free_code should possibly only be generated in case storage has value StorageConf::Disk(path), not in case of StorageConf::Embed.

learnedsystems / rmi Goto Github PK

rmi's Issues

Off-by-one-Error in `train_two_layer` implementation

Optimizer interpretation

Multiple layers

Citation

Radix root model not working with small branching factor

NN model support

Add option to malloc instead of static alloc arrays

cleanup() should only be called on malloc'ed model parameters

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs