GithubHelp home page GithubHelp logo

resnet-tf-dist-1's Introduction

code modified based on tensorflow official resnet model

ResNet in TensorFlow

Deep residual networks, or ResNets for short, provided the breakthrough idea of identity mappings in order to enable training of very deep convolutional neural networks. This folder contains a distributed implementation of ResNet for the cifar10 dataset written in TensorFlow.

CIFAR-10

Setup

You simply need to have the latest version of TensorFlow installed. First make sure you've added the models folder to your Python path; otherwise you may encounter an error like ImportError: No module named official.resnet.

export PYTHONPATH="$PYTHONPATH:/home/ubuntu/models"

Download the dataset

Then download and extract the CIFAR-10 data from Alex's website, specifying the location with the --data_dir flag. Run the following:

python cifar10_download_and_extract.py

Use --data_dir to specify the location of the CIFAR-10 data used in the previous step. There are more flag options as described in cifar10_main.py.

Start training distributed ResNet model

This distributed training contains 1 parameter_servers (ps) and 3 workers as an example. Number of parameter_server and wokers are stored in parameter_servers and workers lists respectively. Users have to open cooresponding number of terminal windows in order to start each training task.

  1. ssh to the remote machines
  2. activivate tensorflow
  3. killall python - *optional
  4. in each window, type in the following cmd individually to start training task
killall python
source activate tensorflow
python cifar10_main.py --job_name="ps" --task_index=0
python cifar10_main.py --job_name="worker" --task_index=0
python cifar10_main.py --job_name="worker" --task_index=1 
python cifar10_main.py --job_name="worker" --task_index=2 
  1. after training tasks are done, kill the ps port by its id as follows:
lsof -wni tcp:2222
# will return process ids
kill id

use of tf.profiler

changes made in cifar10_main_profiler.py

This file added profiler-ui which produces a profile file (timeline) must be read by chrome please download profiler-ui in order to check the timeline. Here we use profile_100 as an example

  1. rename resnet_run_loop_official.py back to resnet_run_loop.py since no changes made if use profiler
  2. run cifar10_main_profiler.py to generate profile file (ex: profile_100)
  3. run the cmd below python ui.py --profile_context_path=profiler-ui/profile_100
  4. if the brower pop-up is not chrome, copy the URL to chrome. Now you will be able to see the timeline

use of json file

changes made in resnet_run_loop.py

This file is used to generate json file while tracking the training open chrome://tracing in chrome brower then load the json file. Now you will be able to see the timeline

resnet-tf-dist-1's People

Contributors

shiyudian avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.