GithubHelp home page GithubHelp logo

sykwon / teddy-dream Goto Github PK

View Code? Open in Web Editor NEW
8.0 2.0 1.0 106.04 MB

[VLDB'22] Cardinality Estimation of Approximate Substring Queries using Deep Learning.

License: MIT License

Python 54.11% Shell 3.00% Makefile 0.21% C++ 30.40% C 0.30% Jupyter Notebook 11.61% HTML 0.25% Dockerfile 0.10%
cardinality-estimation data-generation deep-learning similarity-search substring-search

teddy-dream's Introduction

(SODDY, TEDDY) & DREAM

license ubuntu gcc 7.5 python 3.7 cuda 11.6 size Hits

This repository implements training data generation algorithms (SODDY & TEDDY) and deep cardinality estimators (DREAM) proposed in our paper "Cardinality Estimation of Approximate Substring Queries using Deep Learning". It is created by Suyong Kwon, Woohwan Jung and Kyuseok Shim.

Repository Overview

It consists of four folders each of which contains its own README file and script.

Folder Description
gen_train_data training data generation algorithms
dream deep cardinality estimators for approximate substring queries
astrid the modified version of Astrid starting from the astrid model downloaded from [github]
plot example notebook files

Installation and Requirements

It is recommended to run our code with the CUDA environment. However, the non-CUDA version of our code is also working when the pytorch library does not supper GPU. (You may set CUDA_VISIBLE_DEVICES as -1 to enforce CPU mode.)

Method 1: Use the Docker Image

To run the image needs the NVIDIA Container Toolkit. If you do not have the toolkit, refer to the installation guide

git clone https://github.com/sykwon/teddy-dream.git

# run docker image
docker run -it --gpus all --name dream -v ${PWD}:/workspace -u 1000:1000 sykwon/dream /bin/bash

# after starting docker
redis-server --daemonize yes
cd gen_train_data/
make clean && make && make info
cd ..

Method 2: Create a Virtual Python Environment

This code needs Python-3.7 or higher.

sudo apt-get install -y redis-server git
sudo apt-get install -y binutils
sudo apt-get install -y texlive texlive-latex-extra texlive-fonts-recommended dvipng cm-super

conda create -n py37 python=3.7
source activate py37
conda install -y pytorch=1.7.1 torchvision=0.8.2 cudatoolkit=11.0 -c pytorch -c nvidia

pip install -r requirements.txt

Datasets

  • DBLP
  • GENE
  • WIKI
  • IMDB

Examples

These commands produces experimental results.

cd gen_train_data
./run.sh DBLP     # to generate training data from the DBLP dataset
# ./run.sh GENE   # to generate training data from the GENE dataset
# ./run.sh WIKI   # to generate training data from the WIKI dataset
# ./run.sh IMDB   # to generate training data from the IMDB dataset
# ./run.sh all    # to generate training data from all datasets
cd ..

cd dream
./run.sh DBLP    # to train all models except Astrid with the DBLP dataset
# ./run.sh GENE  # to train all models except Astrid with the GENE dataset
# ./run.sh WIKI  # to train all models except Astrid with the WIKI dataset
# ./run.sh IMDB  # to train all models except Astrid with the IMDB dataset
# ./run.sh all   # to train all models except Astrid with all datasets
cd ..

cd astrid
./run.sh DBLP    # to train the Astrid model with the DBLP dataset
# ./run.sh GENE  # to train the Astrid model with the GENE dataset
# ./run.sh WIKI  # to train the Astrid model with the WIKI dataset
# ./run.sh IMDB  # to train the Astrid model with the IMDB dataset
# ./run.sh all   # to train the Astrid model with all datasets
cd ..

Please refer to [notebook] to see the experimental results.

Citation

Please consider to cite our paper if you find this code useful:

@article{kwon2022cardinality,
    title={Cardinality estimation of approximate substring queries using deep learning},
    author={Kwon, Suyong and Jung, Woohwan and Shim, Kyuseok},
    journal={Proceddings of the VLDB Endowment},
    volume={15},
    number={11},
    year={2022}
}

teddy-dream's People

Contributors

sykwon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

ruijietian

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.