GithubHelp home page GithubHelp logo

test-mass-forker-org-1 / sagemaker-rl-container Goto Github PK

View Code? Open in Web Editor NEW

This project forked from aws/sagemaker-rl-container

0.0 0.0 0.0 175 KB

A set of dockerfiles that provide Reinforcement Learning solutions for use in SageMaker.

License: Apache License 2.0

Shell 2.29% Python 86.95% C 0.29% HCL 4.18% Jupyter Notebook 4.12% Dockerfile 2.17%

sagemaker-rl-container's Introduction

Amazon SageMaker RL Containers

A set of Dockerfiles that enables Reinforcement Learning (RL) solutions to be used in SageMaker.

The SageMaker team uses this repository to build its official RL images. On how to use any of these images on SageMaker, see Python SDK. For end users, this repository is typically of interest if you need implementation details of the official image, or if you want to use it to build your own customized RL image.

For information on running RL jobs on SageMaker: SageMaker RLEstimators.

For notebook examples: SageMaker Notebook Examples.

Table of Contents

  1. Getting Started
  2. RL Images Provided by SageMaker
  3. Building Your Image
  4. Running the Tests

Getting Started

Prerequisites

Make sure you have installed all of the following prerequisites on your development machine:

For Testing on GPU

Terminologies

Toolkit

Toolkits are libraries that provide specific algorithms to train a Reinforcement Learning model. We currently provide Dockerfiles for these three toolkits:

Framework

Framework refers to a Deep Learning framework/library that a toolkit may need in order to train an algorithm. We use Sagemaker created framework images/prebuilt Amazon SageMaker Docker images as base images in a Toolkit's Dockerfile (whenever required). Currently we are using these two frameworks:

  • TensorFlow (used for Ray and Coach)
  • PyTorch (used for Ray)
  • MXNet (used for Coach)

Note: VW doesn't require a framework

RL Images Provided by SageMaker

MXNet Coach Images:

  • 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-mxnet:coach0.11-cpu-py3
  • 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-mxnet:coach0.11.0-cpu-py3
  • 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-mxnet:coach0.11-gpu-py3
  • 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-mxnet:coach0.11.0-gpu-py3

TensorFlow Coach Images:

  • 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-tensorflow:coach0.10-cpu-py3
  • 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-tensorflow:coach0.10.1-cpu-py3
  • 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-tensorflow:coach0.10-gpu-py3
  • 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-tensorflow:coach0.10.1-gpu-py3
  • 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-tensorflow:coach0.11-cpu-py3
  • 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-tensorflow:coach0.11.0-cpu-py3
  • 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-tensorflow:coach0.11.1-cpu-py3
  • 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-tensorflow:coach0.11-gpu-py3
  • 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-tensorflow:coach0.11.0-gpu-py3
  • 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-tensorflow:coach0.11.1-gpu-py3
  • 462105765813.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-coach-container:coach-1.0.0-tf-cpu-py3
  • 462105765813.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-coach-container:coach-1.0.0-tf-gpu-py3

TensorFlow Ray Images:

  • 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-tensorflow:ray0.6-cpu-py3
  • 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-tensorflow:ray0.6.5-cpu-py3
  • 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-tensorflow:ray0.6-gpu-py3
  • 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-tensorflow:ray0.6.5-gpu-py3
  • 462105765813.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-ray-container:ray-0.8.2-tf-cpu-py36
  • 462105765813.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-ray-container:ray-0.8.2-tf-gpu-py36
  • 462105765813.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-ray-container:ray-0.8.5-tf-cpu-py36
  • 462105765813.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-ray-container:ray-0.8.5-tf-gpu-py36
  • 462105765813.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-ray-container:ray-1.6.0-tf-cpu-py37
  • 462105765813.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-ray-container:ray-1.6.0-tf-gpu-py37

PyTorch Ray Images:

  • 462105765813.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-ray-container:ray-0.8.5-torch-cpu-py36
  • 462105765813.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-ray-container:ray-0.8.5-torch-gpu-py36
  • 462105765813.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-ray-container:ray-1.6.0-torch-cpu-py36
  • 462105765813.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-ray-container:ray-1.6.0-torch-gpu-py36

Vowpal Wabbit Images:

  • 462105765813.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-vw-container:vw-8.7.0-cpu

List of supported SageMaker regions.

Building Your Image

NOTE: The Amazon SageMaker RL team will provide Dockerfile for the newer Ray images (Ray >= 1.6.0) soon.

Amazon SageMaker utilizes Docker containers to run all training jobs and inference endpoints.

The Docker images are built from the Dockerfiles specified in this repository at:

The Dockerfiles are grouped by RL toolkit and toolkit version. Within that, they are separated by framework (if needed). For e.g., the Dockerfile for Coach v0.11.0 with MXNet framework can be found at: coach/docker/0.11.0/Dockerfile.mxnet.

For toolkits Ray and Coach, the Dockerfiles use deep learning framework images provided by SageMaker as their "base" images.

These "base" images are specified with the following naming convention:

520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-<framework>:<framework_version>-<processor>-py3
  • <framework> can be tensorflow-scriptmode (with <framework_version> 1.11.0 or higher depending on the toolkit requirements) or mxnet (with <framework_version> 1.3.0 or higher depending on the toolkit requirements);
  • <processor> can be cpu or gpu;
  • for valid <region> values please see list of supported SageMaker regions.

Before building images:

Pull deep learning framework "base" image, which require Docker, AWS credentials, and AWS CLI.

# Login into SageMaker ECR account
$(aws ecr get-login --no-include-email --region <region> --registry-ids 520713654638)
# Pull docker image from ECR
docker pull 520713654638.dkr.ecr.<region>.amazonaws.com/sagemaker-<framework>:<framework_version>-<processor>-py3
# Example

$(aws ecr get-login --no-include-email --region us-west-2 --registry-ids 520713654638)

# CPU TensorFlow image
docker pull 520713654638.dkr.ecr.us-west-2.amazonaws.com/sagemaker-tensorflow-scriptmode:1.11.0-cpu-py3

# GPU MXNet image
docker pull 520713654638.dkr.ecr.us-west-2.amazonaws.com/sagemaker-mxnet:1.3.0-gpu-py3

To build RL Docker image:

# All build instructions assume you're building from the root directory of the sagemaker-rl-container.

# CPU
docker build -t <image_name>:<tag> -f <rl_toolkit>docker/<rl_toolkit_version>/Dockerfile.<framework> --build-arg processor=<cpu_or_gpu> .

# GPU
docker build -t <image_name>:<tag> -f <rl_toolkit>/docker/<rl_toolkit_version>/Dockerfile.<framework> --build-arg processor=<cpu_or_gpu> .
# Example

# Ray TensorFlow CPU
docker build -t tf-ray:0.6.5-cpu-py3 -f ray/docker/0.6.5/Dockerfile.tf --build-arg processor=cpu .

# Coach TensorFlow GPU
docker build -t tf-coach:0.11.0-gpu-py3 -f coach/docker/0.11.0/Dockerfile.tf --build-arg processor=gpu .

# Coach MXNet CPU
docker build -t mxnet-coach:0.11.0-cpu-py3 -f coach/docker/0.11.0/Dockerfile.mxnet --build-arg processor=cpu .

# VW CPU
docker build -t vw:8.7.0-cpu -f vw/docker/8.7.0/Dockerfile .

Running the Tests

Running the tests requires installation of test dependencies.

git clone https://github.com/aws/sagemaker-rl-container.git
cd sagemaker-rl-container
pip install .

Tests are defined in test/ and include local integration and SageMaker integration tests.

Local Integration Tests

Running local integration tests require Docker and AWS credentials, as the local integration tests make calls to a couple of AWS services. The local integration tests and SageMaker integration tests require configurations specified within their respective conftest.py.

Local integration tests on GPU require Nvidia-Docker.

Before running local integration tests:

  1. Build your Docker image.
  2. Pass in the correct pytest arguments to run tests against your Docker image.

If you want to run local integration tests, then use:

# Required arguments for integration tests are found in test/conftest.py
pytest test/integration/local --toolkit <toolkit_to_run_tests_for> \
                              --docker-base-name <your_docker_image> \
                              --tag <your_docker_image_tag> \
                              --processor <cpu_or_gpu>
# Example
pytest test/integration/local --toolkit coach \
                              --docker-base-name custom-rl-coach-image \
                              --tag 1.0 \
                              --processor cpu

SageMaker Integration Tests

SageMaker integration tests require your Docker image to be within an Amazon ECR repository <https://docs .aws.amazon.com/AmazonECS/latest/developerguide/ECS_Console_Repositories.html>__.

The Docker base name is your ECR repository namespace <https://docs.aws.amazon .com/AmazonECR/latest/userguide/Repositories.html>__.

The instance type is your specified Amazon SageMaker Instance Type that the SageMaker integration test will run on.

Before running SageMaker integration tests:

  1. Build your Docker image.
  2. Push the image to your ECR repository.
  3. Pass in the correct pytest arguments to run tests on SageMaker against the image within your ECR repository.

If you want to run a SageMaker integration end to end test on Amazon SageMaker, then use:

# Required arguments for integration tests are found in test/conftest.py
pytest test/integration/sagemaker --toolkit <toolkit_to_run_tests_for> \
                                  --aws-id <your_aws_id> \
                                  --docker-base-name <your_docker_image> \
                                  --instance-type <amazon_sagemaker_instance_type> \
                                  --tag <your_docker_image_tag> \
# Example
pytest test/integration/sagemaker --toolkit coach \
                                  --aws-id 12345678910 \
                                  --docker-base-name custom-rl-coach-image \
                                  --instance-type ml.m4.xlarge \
                                  --tag 1.0

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

License

This library is licensed under the Apache 2.0 License.

Note: Specific license for Toolkits/Frameworks, if any, can be found in <toolkit>/docker/LICENSE or in the Framework's image

sagemaker-rl-container's People

Contributors

nadiaya avatar goel-akas avatar yijiezh avatar yunzhe-tao avatar goelakash avatar laurenyu avatar yangaws avatar longyuzhao avatar choibyungwook avatar jesterhazy avatar sidd1809 avatar garvijayaud avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.