GithubHelp home page GithubHelp logo

harshawsharma / sagemaker_model_lifecycle Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jpbarto/sagemaker_bring_your_own_model

0.0 1.0 0.0 1.34 MB

A collection of labs to demonstrate how to package, train, and deploy a model using Amazon SageMaker

License: Apache License 2.0

Jupyter Notebook 66.28% Shell 2.72% Python 31.00%

sagemaker_model_lifecycle's Introduction

sagemaker_model_lifecycle

A collection of labs to demonstrate how to package, train, and deploy a model using Amazon SageMaker

This example shows how to package an algorithm for use with SageMaker.

SageMaker supports two execution modes: training where the algorithm uses input data to train a new model and serving where the algorithm accepts HTTP requests and uses the previously trained model to do an inference (also called "scoring", "prediction", or "transformation").

This binary classification algorithm supports both training and scoring in SageMaker with the same container image. It is perfectly reasonable to build an algorithm that supports only training or scoring as well as to build an algorithm that has separate container images for training and scoring.

In order to build a production grade inference server into the container, we use the following stack to make the implementer's job simple:

  1. nginx is a light-weight layer that handles the incoming HTTP requests and manages the I/O in and out of the container efficiently.
  2. gunicorn is a WSGI pre-forking worker server that runs multiple copies of your application and load balances between them.
  3. flask is a simple web framework used in the inference app that you write. It lets you respond to call on the /ping and /invocations endpoints without having to write much code.

The Structure of the Sample Code

The components are as follows:

  • Dockerfile: The Dockerfile describes how the image is built and what it contains. It is a recipe for your container and gives you tremendous flexibility to construct almost any execution environment you can imagine. Here. we use the Dockerfile to describe a pretty standard Python / TensorFlow stack and the simple scripts that we're going to add to it. See the Dockerfile reference for what's possible here.

  • build_and_push.sh: The script to build the Docker image (using the Dockerfile above) and push it to the Amazon EC2 Container Registry (ECR) so that it can be deployed to SageMaker. Specify the name of the image as the argument to this script. The script will generate a full name for the repository in your account and your configured AWS region. If this ECR repository doesn't exist, the script will create it.

  • model_src: The directory that contains the application to run in the container. See the next session for details about each of the files.

  • local_test: A directory containing scripts and a setup for running a simple training and inference jobs locally so that you can test that everything is set up correctly. See below for details.

The application run inside the container

When SageMaker starts a container, it will invoke the container with an argument of either train or serve. We have set this container up so that the argument is treated as the command that the container executes. When training, it will run the train program included and, when serving, it will run the serve program.

  • train: The main program for training the model. When you build your own algorithm, you'll edit this to include your training code.
  • serve: The wrapper that starts the inference server. In most cases, you can use this file as-is.
  • wsgi.py: The start up shell for the individual server workers. This only needs to be changed if you changed where predictor.py is located or is named.
  • predictor.py: The algorithm-specific inference server. This is the file that you modify with your own algorithm's code.
  • nginx.conf: The configuration for the nginx master server that manages the multiple workers.

Setup for local testing

The subdirectory local-test contains scripts and sample data for testing the built container image on the local machine. When building your own algorithm, you'll want to modify it appropriately.

  • train-local.sh: Instantiate the container configured for training.
  • serve-local.sh: Instantiate the container configured for serving.
  • predict.sh: Run predictions against a locally instantiated server.
  • test_dir: The directory that gets mounted into the container with test data mounted in all the places that match the container schema.
  • sample_data.js: Sample data for used by predict.sh for testing the server.
  • sample_labels.js: Correct labels for comparison with output provided by the predict.sh script.

The directory tree mounted into the container

The tree under test-dir is mounted into the container under /opt/ml and mimics the directory structure that SageMaker would create for the running container during training or hosting.

├── input
│   ├── config
│   │   ├── hyperparameters.json
│   │   ├── inputdataconfig.json
│   │   └── resourceconfig.json
│   └── data
│       ├── eval
│       │   ├── t10k-images-idx3-ubyte
│       │   ├── t10k-labels-idx1-ubyte
│       └── training
│           ├── train-images-idx3-ubyte
│           └── train-labels-idx1-ubyte
├── model
└── output
  • hyperparameters.json: The hyperparameters for the training job.
  • resourceconfig.json: The details of the local host running the container and any other containers currently executing training.
  • inputdataconfig.json: The details about all data channels configured for this training job.
  • train-images-idx3-ubyte: The training data.
  • train-labels-idx1-ubyte: The training labels.
  • t10k-images-idx3-ubyte: The evaluation data.
  • t10k-labels-idx1-ubyte: The evaluation labels.
  • model: The directory where the algorithm writes the model file.
  • output: The directory where the algorithm can write its success or failure file.

Environment variables

When you create an inference server, you can control some of Gunicorn's options via environment variables. These can be supplied as part of the CreateModel API call.

Parameter                Environment Variable              Default Value
---------                --------------------              -------------
number of workers        MODEL_SERVER_WORKERS              the number of CPU cores
timeout                  MODEL_SERVER_TIMEOUT              60 seconds

sagemaker_model_lifecycle's People

Contributors

jpbarto avatar

Watchers

Harsha Sharma avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.