GithubHelp home page GithubHelp logo

gcp-dl's Introduction

gcp-dl

Quickly and easily setup a cloud machine for Deep Learning Experimentation in GCP. The quickly, easily parts are WIP.

STEPS

Completely Script Based Approach

For example, if I want to run a one GPU, 4 CPU instance in us-east-d` with 1TB SSD bootdisk and install CUDA on it.

gcloud beta compute instances create eshvk-dl-fastai \
    --boot-disk-size=1TB --boot-disk-type= \
    --machine-type n1-standard-4 --zone us-east1-d \
    --accelerator type=nvidia-tesla-k80,count=1 \
    --image-family ubuntu-1604-lts --image-project ubuntu-os-cloud \
    --maintenance-policy TERMINATE --restart-on-failure \
    --metadata startup-script='#!/bin/bash
    echo "Checking for CUDA and installing."
    # Check for CUDA and try to install.
    if ! dpkg-query -W cuda; then
      curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
      dpkg -i ./cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
      apt-get update
      apt-get install cuda -y
    fi'
  • Connect to the instance and check if the CUDA driver has been installed by:
nvidia-smi

You should see something like this.

NOTE If the driver has not been installed, you will want to first check if the driver has been installed by the startup script. Do a tail -f /var/log/syslog. It does take a few minutes before that happens.

  • Both of these steps can be conveniently combined together like so:
gcloud beta compute instances create eshvk-dl-fastai \
    --boot-disk-size=1TB --boot-disk-type= \
    --machine-type n1-standard-4 --zone us-east1-d \
    --accelerator type=nvidia-tesla-k80,count=1 \
    --image-family ubuntu-1604-lts --image-project ubuntu-os-cloud \
    --maintenance-policy TERMINATE --restart-on-failure \
    --metadata-from-file startup-script=install-gpu.sh
  • Create a secondary SSD disk and mount it.
gcloud compute disks create eshvk-dl-fastai-disk --size 10TB --type pd-ssd --zone us-east1-d

gcloud compute instances attach-disk eshvk-dl-fastai --disk  eshvk-dl-fastai-disk --zone us-east1-d

  • SSH into the machine; Format the disk, mount it using the instructions here.

For example:

# Here sdb is the device ID I get from lsblk
sudo mkfs.ext4 -m 0 -F -E lazy_itable_init=0,lazy_journal_init=0,discard /dev/sdb
# Mount point
sudo mkdir -p /mnt/disks/persistent-data
# Mount disk
sudo mount -o discard,defaults /dev/sdb /mnt/disks/persistent-data
# Add an automatic mount for next time things start.
echo UUID=`sudo blkid -s UUID -o value /dev/sdb` /mnt/disks/persistent-data ext4 discard,defaults,nofail 0 2 | sudo tee -a /etc/fstab
  • Get GCS Service Key (to run notebook/jobs remotely) In order to run a notebook or jobs remotely, get a service key in the GCS console. Once you've downloaded this key, rename it google_service_key.json and move it to the root directory of the repository.

  • Copy the script user-install.sh to the gcloud instance like so:

gcloud compute copy-files user-install.sh eshvk-dl-fastai:~/user-install.sh  --zone us-east1-d
  • Copy the service key google_service_key.json over similarly.
gcloud compute copy-files google_service_key.json eshvk-dl-fastai:~/google_service_key.json  --zone us-east1-d
  • Copy the files auth_and_start.sh and lookup_value_from_json over.
gcloud compute copy-files auth_and_start.sh eshvk-dl-fastai:/usr/local/bin/auth_and_start.sh  --zone us-east1-d
  • SSH in, move the files auth_and_start.sh, lookup_value_from_json to /usr/local/bin, run the script using ./user-install.sh.

  • Firewall forwarding rules:

# this enables jupyter to talk to the external world.
gcloud compute firewall-rules create default-allow-jupyter --allow tcp:8888  --target-tags=allow-jupyter
# Add this to your instance
gcloud compute instances add-tags eshvk-dl-fastai --tags allow-jupyter --zone us-east1-d

  • Now SSH into the machine, do auth_and_start.sh jupyter notebook and log on on your browser with something like http://<external-ip>:8888. The auth_and_start.sh bit gets you authenticated with Google Magic.

Credits

This is based on fast.ai's course setup and easy-python-ml by ZacP.

gcp-dl's People

Contributors

eshvk avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.