GithubHelp home page GithubHelp logo

mindsdb / mindsdb-sagemaker-container Goto Github PK

View Code? Open in Web Editor NEW
7.0 14.0 0.0 87 KB

This code is used for making MindsDB run on Amazon SageMaker.

License: MIT License

Dockerfile 9.28% Shell 11.17% Python 71.89% Jupyter Notebook 7.66%
mindsdb aws-sagemaker example aws amazon-sagemaker sagemaker-python-sdk sagemaker-endpoint mindsdb-sagemaker-container

mindsdb-sagemaker-container's Introduction

MindsDB SageMaker Container

This repository contains the MindsDB containers for use with SageMaker.

MindsDB container supports two execution modes on SageMaker. Training, where MindsDB uses input data to train a new model and serving where it accepts HTTP requests and uses the previously trained model to do a prediction.

Table of contents

Build an image

Execute the following command to build the image:

docker build -t mindsdb-impl .

Note that mindsdb-impl will be the name of the image.

Test the container locally

All of the files for testing the setup are located inside the local_test directory.

Test directory

  • train_local.sh: Instantiate the container configured for training.
  • serve_local.sh: Instantiate the container configured for serving.
  • predict.sh: Run predictions against a locally instantiated server.
  • test-dir: This subdirectory is mounted in the container.
  • test_data: This subdirectory contains a few tabular format datasets used for getting the predictions.
  • input/data/training/file.csv: The training data.
  • model: The directory where mindsdb writes the model files.
  • output: The directory where mindsdb can write its failure file.
  • call.py: This cli can be used for testing the deployed model on SageMaker endpoint

All of the files under test-dir are mounted into the container and mimics the SageMaker directory structure.

Run tests

To train the model execute train script and specify the tag name of the docker image:

./train_local.sh mindsdb-impl

The train script will use the dataset that is located in the input/data/training/ directory.

Then start the server:

./serve_local.sh mindsdb-impl

And make predictions by specifying the payload file in json format:

./predict.sh payload.json

Push the image to Amazon Elastic Container Service

Use the shell script build-and-push.sh, to push the latest image to the Amazon Container Services. You can run it as:

 ./build-and-push.sh mindsdb-impl 

The script will look for an AWS EC repository in the default region that you are using, and create a new one if that doesn't exist.

Training

When you create a training job, Amazon SageMaker sets up the environment, performs the training, then store the model artifacts in the location you specified when you created the training job.

Required parameters

  • Algorithm source: Choose Your own algorithm and provide the registry path where the mindsdb image is stored in Amazon ECR 846763053924.dkr.ecr.us-east-1.amazonaws.com/mindsdb_impl
  • Input data configuration: Choose S3 as a data source and provide path to the backet where the dataset is stored e.g s3://bucket/path-to-your-data/
  • Output data configuration: This would be the location where the model artifacts will be stored on s3 e.g s3://bucket/path-to-write-models/

Add HyperParameters

You can use hyperparameters to finely control training. The required parameter for training models with mindsdb is: to_predict parameter. That is the column we want to learn to predict given all the data in the file e.g to_predict = Class

Inference

You can also create a model, endpoint configuration and endpoint using AWS Management Console .

Create model

Choose the role that has the AmazonSageMakerFullAccess IAM policy attached. Next, you need to provide the location of the model artifacts and inference code.

  • Location of inference code image: Location to the ECR image 846763053924.dkr.ecr.us-east-1.amazonaws.com/mindsdb_impl:latest
  • Location of model artifacts - optional Path to the s3 where the models are saved. This is the same location that you provide on train job s3://bucket/path-to-write-models/

Create endpoint

First, create an endpoint configuration. In the configuration, specify which models to deploy and hardware requirements for each. The required option is Endpoint configuration name and then add the previously created model. Then go to Create and configure endpoint, add the Endpoint name, and Attach endpoint configuration. Usually, it would take around few minutes to start the instance and create endpoint.

Call endpoint

When the endpoint is in InService status, you can create python script or notebook from which you can get the predictions.

import boto3

endpointName = 'mindsdb-impl'

# read test dataset
with open('diabetest-test.csv', 'r') as reader:
        payload = reader.read()
# Talk to SageMaker
client = boto3.client('sagemaker-runtime')
response = client.invoke_endpoint(
    EndpointName=endpointName,
    Body=payload,
    ContentType='text/csv',
    Accept='Accept'
)
print(response['Body'].read().decode('ascii'))
//mindsdb prediction response
{
"prediction": "* We are 96% confident the value of "Class" is positive.", 
 "class_confidence": [0.964147493532568]
}

Or you can use call.py cli located under local_test e.g:

python3 call.py --endpoint mindsdb-impl --dataset test_data/diabetes-test.json --content-type application/json

Using the SageMaker Python SDK

SageMaker provides Estimator implementation that runs SageMaker compatible custom Docker containers, enabling our own MindsDB implementation.

Starting train job

he Estimator defines how you can use the container to train. This is simple example that includes the required configuration to start training:

import sagemaker as sage

#Add AmazonSageMaker Execution role here
role = "arn:aws:iam:"

sess = sage.Session()
account = sess.boto_session.client('sts').get_caller_identity()['Account']
bucket_path = "s3://mdb-sagemaker/models/"
region = sess.boto_session.region_name
image = '{}.dkr.ecr.{}.amazonaws.com/mindsdb_lts:latest'.format(account, region)

#Hyperparameters to_predict is required for MindsDB container
mindsdb_impl = sage.estimator.Estimator(image,
                       role, 1, 'ml.m4.xlarge',
                       output_path=bucket_path,
                       sagemaker_session=sess,
                       base_job_name="mindsdb-lts-sdk",
                       hyperparameters={"to_predict": "Class"})

dataset_location = 's3://mdb-sagemaker/diabetes.csv'
mindsdb_impl.fit(dataset_location)

Deploy model and create endpoint

The model can be deployed to SageMaker by calling deploy method.

predictor = mindsdb_impl.deploy(1, 'ml.m4.xlarge', endpoint_name='mindsdb-impl')

The deploy method configures the Amazon SageMaker hosting services endpoint, deploy model and launches the endpoint to host the model. It returns RealTimePredictor object, from which you can get the predictions from.

with open('test_data/diabetes-test.csv', 'r') as reader:
        when_data = reader.read()
print(predictor.predict(when_data).decode('utf-8'))

The predict endpoint accepts test datasets in CSV, Json, Excel data formats.

Delete the endpoint

Don't forget to delete the endpoint when you are not using it.

sess.delete_endpoint('mindsdb-impl')

Other usefull resources

mindsdb-sagemaker-container's People

Contributors

zoranpandovski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mindsdb-sagemaker-container's Issues

Add tsv option

Improve /invocations endpoint to accept text/tab-separated-values.

Error testing container

docker: Error response from daemon: pull access denied for mindsdb-impl, repository does not exist or may require 'docker login': denied: requested access to the resource is denied. I tried docker login which was successful and i run the code again but the problem still persists

.

running docker build -t mindsdb-impl . also results in the following errors

E: The repository 'http://security.ubuntu.com/ubuntu cosmic-security Release' does not have a Release file. E: The repository 'http://archive.ubuntu.com/ubuntu cosmic Release' does not have a Release file. E: The repository 'http://archive.ubuntu.com/ubuntu cosmic-updates Release' does not have a Release file. E: The repository 'http://archive.ubuntu.com/ubuntu cosmic-backports Release' does not have a Release file

Use Pipe input mode

So far we've used file input mode. We should consider the option for Pipe mode with large datasets.
Also, include documentation on how the users can use MindsDB with pipe mode and code examples.

Multiple models

Investigate options for training and hosting one endpoint with multiple models.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.