GithubHelp home page GithubHelp logo

ccdt's Introduction

2023_KCC Model

This is the code of the Korea Computer Congress 2023 (KCC 2023) paper

'A Study about Search Space of Knob Range Reduction for Database Tuning'.

This study proposes a method to reduce the search space as an optimization method that can improve the performance of database parameters (knobs).


- MySQL ver. 5.7

- Num of Parameters = 139

- Num of Config = 200

- Workload : TPCC , Twitter


Firstly, we randomly generate 200 samples via Latin Hypercube Sampling (LHS).

Secondly, we select 10 knobs that have a significant impact on database performance by a knob ranking algorithm.

Thirdly, 10 configurations within the generated samples are selected based on their measured database performance, where we calculated score (throughput/latency) to compare multiple configurations.

Then, we find the used value range of each selected knob from the selected configurations.

With these newly defined knob ranges, the optimization algorithm can search knob values within a narrower range than its default range.

Paper

Below is link of 'A Study about Search Space of Knob Range Reduction for Database Tuning' paper
Paper link

ccdt's People

Contributors

kwon-sein avatar addb-swstarlab avatar

Stargazers

 avatar  avatar SKD avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar Jonghwan Choi avatar KyeonghunKim avatar  avatar  avatar NeumNyang avatar Kim Hyun-jin avatar  avatar

Watchers

James Cloos avatar  avatar

Forkers

kwon-sein hyojoys

ccdt's Issues

About Model Selection

Why did you choose the ensemble model even though there are various model other than ensemble architecture among supervised learning techniques?

About XGBRegressor in SMAC_with_our_data_twitter.ipynb

Hello,

In the current code, you are using fixed hyperparameter values in XGBRegressor.
Hyperparameters are crucial factors that determine the performance of a model, and finding values optimized for a specific problem is crucial.
Additionally, I'm wondering if you've applied hyperparameter tuning techniques such as Grid Search, Random Search, or Bayesian Optimization.

Thanks for reading.

About prediction model for feature selction

When I looked at your code, I think you used a random forest model to select the features that are important for predicting database performance because you can use the feature_importance_ function built into the random forest model.

However, the performance of the randomforest prediction model seems to be rather poor, so I think the reliability of the important features extracted by the prediction model is low. So, have you tried using a model that can utilize attention scores other than the randomforest or adaboost model, or have you tried using another kind of model?

Top-K knob

The code says that you get Top-K knob through the SHAP algorithm, but can I use LIME, PDP, Permutation Feature Importance instead of SHAP?

About dataset

Hello, Thanks for your work,

In Jupyter_model.ipynb, why did't you remove outlier?

스크린샷 2023-03-10 오후 3 46 57

I think this checked dot is outlier

About tps, latency

Hello,

I have two question.
what does tps, latency mean in testing_tpcc.ipynb?

and why dose the score imply tps/latency?

Thanks!

About Code

Hello, I have some questions about the code.

In the "testing_tpcc_ipynb" file

Cell 6 and 7 is the same code?

In cell 6, you append best config in variable "best_config".
But you declare "best_config" as empty list in cell 7 and didn't use the "best_config" in cell 7.
So I wonder why you declare the "bset_config".

And cell 5 and 8, the result of metrics.sort_value('score', ascending=False) is different.
Can you tell me why?

Thank you.

About selection of clustering algorithm

When I looked at that code, the clustering technique used was K-MEANS. However, as far as I know, there are many other algorithms other than k-means that can be used as a clustering technique, especially for configuration data in the form of tabular data. So is there any reason why you used k-means instead of using other techniques?

About Scaler in new_idea.ipynb

Hello,

The current code uses StandardScaler and MinMaxScaler to scale the data, which adjusts the distribution of the data to help the learning algorithm perform better.

However, depending on the characteristics of the data and the requirements of the model, different scaling methods may be more appropriate.

For example, log transformation can be used to adjust the distribution of the data, or RobustScaler can be used to apply scaling that is less sensitive to outliers.

I wonder if you've ever experimented with these different scaling methods. If so, I wonder what the results were.

Thanks for reading.

About Classifier Hyper parameter.

Hello,

What criteria did you set the hyperparameters for the XGBClassifier in CCDT/new_model/main_classification.py?

I wonder how the results obtained when adjusting the hyperparameter values are different.

Thanks for reading.

About Kmeans label

You set 4 label that each color is 'navy', 'tomato', 'green', 'orange', but you practically use 3 label.

Why did you do that?

Thank you for reading.

About reason of score.

Hi,
First of all, thanks for the great research.

My question is how to calculate the score for each config to pick the best performing one. I noticed that the range (scale) of values for TPS and latency are very different. If I calculate the ratio of those attributes without any normalization or scaling for these two values, it seems that the relatively large scale of TPS will have a dominant effect on the score, is there any particular reason for calculating the score without scaling?

pick Top-10 configuration

Ask about the content of the paper (A Study about Search Space of Job Range Reduction for Database Tuning).

Why pick Top-10 of configuration in Figure 1? Is it related to Knob range reduction?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.