GithubHelp home page GithubHelp logo

uncle-justice / dbbert Goto Github PK

View Code? Open in Web Editor NEW

This project forked from itrummer/dbbert

0.0 0.0 0.0 57.12 MB

DB-BERT tunes database systems for optimal performance, using tuning hints mined from text.

Home Page: https://itrummer.github.io/dbbert

Shell 4.41% Python 95.59%

dbbert's Introduction

DB-BERT: the Tuning Tool that "Reads" the Manual

DB-BERT extracts hints for database parameter settings from text via natural language analysis. It then optimizes parameter settings for a given workload and performance metric using reinforcement learning.

Running Experiments

The following instructions have been tested on a p3.2xlarge EC 2 instance with Deep Learning Base GPU AMI (Ubuntu 20.04) 20230811 AMI. On that system, run them from the ubuntu home directory.

  1. Install PostgreSQL and MySQL, e.g., run:
sudo apt-get update
sudo apt install postgresql-12
sudo apt install mysql-server-8.0
  1. Create a user named "dbbert" with password "dbbert" for PostgreSQL and MySQL:
sudo adduser dbbert # Type "dbbert" as password twice, then press enter repeatedly, finally "Y" to confirm
sudo -u postgres psql -c "create user dbbert with password 'dbbert' superuser;"
sudo mysql -e "create user 'dbbert'@'localhost' identified by 'dbbert'; grant all privileges on *.* to 'dbbert'@'localhost'; flush privileges;"
  1. Download this repository, e.g., run:
git clone https://github.com/itrummer/dbbert
  1. Install dependencies:
sudo apt install libpq-dev=12.16-0ubuntu0.20.04.1
cd dbbert
sudo pip install -r requirements.txt
  1. Install benchmark databases using scripts in scripts folder (installing all databases may take one to two hours):
sudo scripts/installtpch.sh
sudo scripts/installjob.sh
  1. You can now run experiments with DB-BERT, e.g.:
PYTHONPATH=src python3.9 src/run/run_dbbert.py demo_docs/postgres100 64000000000 200000000000 8 pg tpch dbbert dbbert "sudo systemctl restart postgresql" /tmp/tpchdata/queries.sql --recover_cmd="sudo rm /var/lib/postgresql/12/main/postgresql.auto.conf"

During execution, DB-BERT generates two result files:

  • dbbert_results_performance: contains tab-separated rows describing performance measurements for each trial run. The columns represent (from left to right) the run counter, the evaluation counter within the run, the time since the start of the current run in milliseconds, the optimal performance achieved over all evaluations within the run (e.g., if minimizing run time, this is the minimal query execution time in milliseconds), and the performance measured for the current trial run.
  • dbbert_results_configure: contains tab-separated rows describing configurations used for each trial run. The columns represent (from left to right) the run counter, the evaluation counter within the run, the time since the start of the current run in milliseconds, the optimal configuration over all evaluations within the run, and the current configuration. Each configuration is represented by a dictionary, mapping parameter names to their values (it only contains parameters with non-default settings).

See next section for explanations on DB-BERT's command line parameters.

Using DB-BERT: CLI

Use DB-BERT from the command line using src/run/run_dbbert.py.

Required Parameters

The module requires setting the following parameters:

Parameter Explanation
text_source_path path to text with tuning hints. Text is stored as .csv file, containing document IDs and text snippets. You find example documents in the demo_docs folder.
memory the amount of main memory of the target platform, measured in bytes.
disk the amount of disk space on the target platform, measured in bytes.
cores the number of cores available on the target platform.
dbms whether to tune PostgreSQL (set to pg) or MySQL (set to ms).
db_name name of database on which target workload is running.
db_user name of database login with access to target database.
db_pwd password of database login.
restart_cmd command for restarting database server from command line. E.g., "sudo systemctl restart postgresql" or "sudo systemctl restart mysql".
query_path path to .sql file containing queries of target workload, separated by semicolon (no semicolon after the last query!).

Note: specifying recover_cmd parameter is optional but highly recommended (otherwise, sub-optimal parameter settings may prevent a restart of the database server, preventing DB-BERT from correcting faulty parameter values).

Optional Parameters

You may also want to set the following parameters:

Parameter Explanation
max_length divide text into chunks of at most that many characters for extracting tuning hints.
filter_params filter text passages via simple heuristic (set to 1, recommended) or not (set to 0).
use_implicit try to detect implicit parameter references (set to 1, recommended) or not (set to 0).
hint_order try out tuning hints in document order (set to 0), parameter order (set to 1), or optimized order (set to 2, recommended).
nr_frames maximal number of frames to execute for learning (recommendation: set high, e.g. 100000, to rely on timeout instead).
timeout_s maximal number of seconds used for tuning (this number is approximate since system finishes trial runs before checking for timeouts).
performance_scaling scale rewards due to performance improvements by this factor.
assignment_scaling scale rewards due to successful parameter value changes by this factor.
nr_evaluations number of trial runs based on the same collection of tuning hints (recommended: 2).
nr_hints number of hints to consider in combination (recommended: 20).
min_batch_size batch size used for text analysis (e.g., 8, optimal settings depend on language model).
recover_cmd command line command to reset database configuration if server restart is impossible. E.g., use "sudo rm /var/lib/postgresql/12/main/postgresql.auto.conf" for PostgreSQL.

Using DB-BERT: GUI

  • To start the GUI, run streamlit run src/run/interface.py from the DB-BERT root directory.
  • If accessing DB-BERT on a remote EC2 server, make sure to enable inbound traffic to port 8501.
  • Enter the URL shown in the console into your Web browser to access the interface.
  • You can select settings to read from configuration files in the demo_configs folder.
  • Select a collection of tuning text documents for extraction (e.g., from the demo_docs folder).
  • You may change parameter related to database access, learning, and tuning goals.
  • Click on the Start Tuning button to start the tuning process.

Resources

A video talk introducing the vision behind this project is available online. Please cite:

@inproceedings{Trummer2022,
author = {Trummer, Immanuel},
booktitle = {SIGMOD},
pages = {190--203},
title = {{DB-BERT: a Database Tuning Tool that ``Reads the Manual''}},
url = {https://doi.org/10.1145/3514221.3517843},
year = {2022}
}

@article{Trummer2021nlp,
author = {Trummer, Immanuel},
journal = {PVLDB},
number = {7},
pages = {1159--1165},
title = {{The Case for NLP-Enhanced Database Tuning: Towards Tuning Tools that “Read the Manual”}},
url = {https://doi.org/10.14778/3450980.3450984},
volume = {14},
year = {2021}
}

dbbert's People

Contributors

itrummer avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.