GithubHelp home page GithubHelp logo

tkbadamdorj / simple-gpu-scheduler Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 67 KB

Hyperparameter search wrapper that uses multiple GPUs.

License: MIT License

Python 100.00%
gpu hyperparameter-optimization hyperparameter-search

simple-gpu-scheduler's Introduction

SIMPLE GPU SCHEDULER

This wrapper allows you to train different models on multiple GPUs in parallel. You don't have to add anything to code that already works. If you can specify hyperparameters for your train.py file by setting different flags, then you can use this.

Advantages

  • Compatible with all machine learning libraries (or actually any Python script that you wanna run on a GPU)
  • No need to add anything to your code. Works for any train.py file that allows hyperparameter specification using command line arguments (see demo)
  • Automatically picks unused GPUs with option to leave some free
  • Automates hyperparameter search
  • Both grid search and random search available

Disadvantages

  • If model training times are very different simple scheduling is not efficient e.g. if GPU 0 is assigned three models that take a long time to train, while GPU 1 is assigned three models that are smaller, GPU 1 may finish very quickly, while GPU 0 marches on.

How it works

You just need to specify the hyperparameters and flags that you would like to search over in two dictionaries, all_params and all_flags like in demo.py. If you want to fix a flag, write "fixed" as the value, and if you want to test a model with and without that flag, write "param" as the value.

all_hparams = {
    'learning_rate': [0.1, 0.01, 0.001],
    'momentum': [0.9, 0.99]
}

all_flags = {
    'normalize': 'fixed',
    'dropout': 'param'
}

Then initialize an HPSearch object, and run the search method. The args that you need are also defined in demo.py

hp_search = HPSearch(all_hparams, all_flags, args)

hp_search.search()

This takes care of the hyperparameter selection (random or grid search) and assigns models to unused GPUs. It sets which GPU to use for each model by setting the appropriate CUDA_VISIBLE_DEVICES.

Scheduling

Models are pre-assigned to GPUs and models on each GPU are run sequentially. For example, if there are 9 models to try out, and we have three GPUs, each GPU is assigned three models. All three GPUs will start training the first models they were assigned. Once a GPU finishes training a model, it will start training the next model it was assigned.

Requirements

Python >= 3.7

All other requirements listed in requirements.txt

Install

Download the package.

git clone https://github.com/taivanbat/gpu-hparam-search-scheduler.git 

Install a virtual environment and activate it

cd gpu-hyperparameter-search
virtualenv .env && source .env/bin/activate

Then install all the required packages

pip install -r requirements.txt

Demo

Run the demo

python demo.py

Usage:

--grid_search
    set this flag to do grid search
    
--num_random
    number of random sets of hyperparameters to pick if not doing grid search
    
--train_file_path
    path to main training file 

--virtual_env_dir
    directory of a virtual environment that will activate before the train.py is called. Default is None

--leave_num_gpus
    number of GPUs to leave free. Useful if workstation is shared 
    
--memory_threshold
    GPU that uses less than memory_threshold in MB is considered not in use
    
--pick_last_free_gpu
    if flag is set, uses last free GPU if there is only one GPU free
    
--log_dir
    where to store model logs 

You can also look at the training file in demo_train.py

The output will look something like this:

OPTIMIZING OVER:
[learning_rate: [0.1, 0.01, 0.001], momentum: [0.9, 0.99]]
Running 3 processes. Leaving 0 GPU(s) free.
----------------------------------
Running process 0 on gpu 1
[learning_rate: 0.1, momentum: 0.99]
----------------------------------

----------------------------------
Running process 0 on gpu 3
[learning_rate: 0.001, momentum: 0.99]
----------------------------------

process 0 finished on gpu 1
----------------------------------
Running process 1 on gpu 1
[learning_rate: 0.001, momentum: 0.99]
----------------------------------

|   learning_rate |   momentum | log_dir   | cuda   |   accuracy |
|-----------------|------------|-----------|--------|------------|
|           0.001 |       0.99 | logs      | True   |     0.2595 |
|           0.1   |       0.99 | logs      | True   |     0.1    |
|           0.001 |       0.99 | logs      | True   |     0.2616 |

simple-gpu-scheduler's People

Contributors

tkbadamdorj avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.