GithubHelp home page GithubHelp logo

mburakg / kubernetes-hyperparam-exp Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nvidia-developer-blog/kubernetes-hyperparam-exp

0.0 1.0 0.0 225 KB

This repository contains code and config files that accompany the blog post:

Home Page: https://devblogs.nvidia.com/kubernetes-ai-hyperparameter-search-experiments

License: Other

Jupyter Notebook 54.77% Python 44.84% Shell 0.39%

kubernetes-hyperparam-exp's Introduction

Kubernetes for hyperparameter search experiments

This repository contains code and config files that accompany the following blog post: Kubernetes for AI Hyperparameter Search Experiments

Tested on Kubernetes version 1.10.11

Install guide: https://docs.nvidia.com/datacenter/kubernetes/kubernetes-install-guide/index.html

Hyperparameters for a machine learning model are options not optimized or learned during the training phase. Hyperparameters typically include options such as learning rate schedule, batch size, data augmentation options and others. Each option greatly affects the model accuracy on the same dataset. Two of the most common strategies for selecting the best hyperparameters for a model are grid search and random search. In the grid search method (also known as the parameter sweep method) you define the search space by enumerating all possible hyperparameter values and train a model on each set of values. Random search only select random sets of values sampled from the exhaustive set. The results of each training run are then validated against a separate validation set.

This repository includes kubernetes specification files and training scripts for running running large-scale hyperparameter search experiments using Kubernetes on a GPU cluster as shown in the figure below. The framework is flexible and allows you to do grid search or random search and implements “version everything” so you can trace back all previously run experiments.

Reference Architecture

The training script is a modified version of the submission by David Page on the Stanford’s DAWNBench webpage. The key modifications include changes to the training script that allow it ot accept hyperparameters by reading them from yaml spec. file.

Assuming you’ve already started by setting up a Kubernetes cluster, our solution for running hyperparameter search experiments consists of the following 7 steps:

  1. Specify hyperparameter search space
  2. Develop a training script that can accept hyperparameters and apply them to the training routine
  3. Push training scripts and hyperparameters in a Git repository for tracking
  4. Upload training and test dataset to a network storage such as NFS server
  5. Specify Kubernetes Job specification files in YAML
  6. Submit multiple Kubernetes job requests using above specification template
  7. Analyze the results and pick the hyperparameter set

kubernetes-hyperparam-exp's People

Contributors

harrism avatar shashankprasanna avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.