AWS SageMaker compatible container to run mlflow trainings.
The container is created with using Amazon SageMaker Containers library
Requirements:
- Python >=3.6
- pip
- aws cli
- aws account
Create quickstart project dir
$ mkdir sm-mlflow-quickstart && cd sm-mlflow-quickstart
Clone mlflow repo to get sources of MLFlow project examples and set env var to one of examples
$ git clone [email protected]:mlflow/mlflow.git
MLFLOW_PROJECT_PATH=mlflow/examples/sklearn_elasticnet_wine
Install sagemaker python sdk
pip install sagemaker
Authorize in AWS SageMaker using AWS CLI
Create SageMaker execution role using AWS Console and export it as env var
export SAGEMAKER_ROLE=`your-role`
Create control_script.py
import os
from contextlib import contextmanager
import sagemaker
from sagemaker.estimator import Estimator, DIR_PARAM_NAME
from sagemaker.utils import create_tar_file
# fetch sagemaker role and mlflow project path
sagemaker_role = os.environ.get('SAGEMAKER_ROLE', 'SageMakerRole')
mlflow_project_path = os.environ.get('MLFLOW_PROJECT_PATH')
# sagemaker-mlflow-conainer built and pushed to your ECR image name
image = 'ecr-image-reference'
# helper function
@contextmanager
def change_dir(new_dir):
old_dir = os.getcwd()
os.chdir(new_dir)
yield
os.chdir(old_dir)
if __name__ == '__main__':
mlflow_project_dir = f'file://{mlflow_project_path}'
sm_session = sagemaker.Session()
with change_dir(mlflow_project_path):
tar_file = create_tar_file(
os.listdir(), target='code.tar.gz'
)
key_prefix = 'submitted_code'
# upload code from mlflow example to s3
s3_uri = sm_session.upload_data(
path=str(tar_file),
key_prefix=key_prefix
)
# describe training
estimator = Estimator(image_name=image,
hyperparameters={
DIR_PARAM_NAME: s3_uri,
'alpha': '1.0',
'sagemaker_mlflow_experiment_id': '2.0',
},
role=sagemaker_role,
train_instance_count=1,
train_instance_type='ml.m4.xlarge')
# run training
estimator.fit()
# print s3 uri to MLFlow logged models
print(estimator.model_data)
Run training
python control_script.py
s3 uri with all mlflow logged models will be printed
This section describes how to solve different tasks that you usually should to solve while run training
mlflow cli support passing entrypoint parameters and other training script command line
arguments using -P, --param-list
parameter
These parameters should be passed using hyperparameters
parameter of Estimator
class
You can customize any extra parameters that are passed into mlflow run
using
sagemaker_mlflow_run_*
prefixed reserved hyperparameters as described in reference section
Inherit from base docker image and install required packages into conda environment with a name "training" or create another one and set $CONDA_TRAINING_ENV to new conda env name
Inherit from base docker image and override $MLFLOW_TRACKING_URI environment variable
All sagemaker_*
prefixed hyperparameters are reserved as sagemaker or framework hyperparameters
-
See list of
sagemaker_*
prefixed Amazon SageMaker Containers reserved hyperparameters on official reference -
sagemaker_mlflow_run
(sagemaker_mlflow_run_*) prefixed hyperparameters are reserved by the container to pass additional parameters tomlflow run
command