GithubHelp home page GithubHelp logo

isabella232 / cml_amp_continuous_model_monitoring Goto Github PK

View Code? Open in Web Editor NEW

This project forked from cloudera/cml_amp_continuous_model_monitoring

0.0 0.0 0.0 1.21 MB

Demonstration of how to perform continuous model monitoring on CML using Model Metrics and Evidently.ai dashboards

License: Apache License 2.0

Python 20.60% Shell 0.01% CSS 74.64% JavaScript 1.69% SCSS 0.81% HTML 2.26%

cml_amp_continuous_model_monitoring's Introduction

Continuous Model Monitoring

A demonstration of how to perform continuous model monitoring on Cloudera Machine Learning (CML) using the Model Metrics feature and Evidently.ai's open-source monitoring dashboards.

After iterations of development and testing, deploying a well-fit machine learning model often feels like the final hurdle for an eager data science team. In practice, however, a trained model is never final, and this milestone marks just the beginnning of a new chapter in the ML lifecycle called production ML. This is because most machine learning models are static, but the world we live in is dynamically changing all the time. Changes in environmental conditions like these are referred to as concept drift, and will cause the predictive performance of a model to degrade over time, eventually making it obsolete for the task it was initially intended to solve.

For an in-depth description of this problem space, please see our research report:

FF22: Inferring Concept Drift Without Labeled Data

To combat concept drift in production systems, its important to have robust monitoring capabilities that alert stakeholders when relationships in the incoming data or model have changed. In this Applied Machine Learning Prototype (AMP), we demonstrate how this can be achieved on CML. Specifically, we leverage CML's Model Metrics feature in combination with Evidently.ai's Data Drift, Numerical Target Drift, and Regression Performance reports to monitor a simulated production model that predicts housing prices over time.

Project Structure

.
├── LICENSE
├── README.md
├── .project-metadata.yaml              # declarative specification for AMP logic
├── apps
│   ├── reports                         # folder to collect monitoring reports
│   └── app.py                          # Flask app to serve monitoring reports
├── cdsw-build.sh                       # build script for model endpoint
├── data                                # directory to hold raw and working data artifacts
├── requirements.txt
├── scripts
│   ├── install_dependencies.py         # commands to install python package dependencies
│   ├── predict.py                      # inference script that utilizes cdsw.model_metrics
│   ├── prepare_data.py                 # splits raw data into training and production sets
│   ├── simulate.py                     # script that runs simulated production logic
│   └── train.py                        # build and train an sklearn pipelne for regression
├── setup.py
└── src
    ├── __init__.py
    ├── api.py                          # utility class for working with CML APIv2
    ├── inference.py                    # utility class for concurrent model requests
    ├── simulation.py                   # utility class for simulation logic
    └── utils.py                        # various utility functions

By launching this AMP on CML, the following steps will be taken to recreate the project in your workspace:

  1. A Python session is run to install all required project dependencies
  2. A Python session is run to split the raw data into training and production sets, then saved locally
  3. A sci-kit learn pipeline with preprocessing and ridge regression steps is constructed and used in a grid search - cross validation to select the best estimator among a set of hyperparameters. This pipeline is save to the project.
  4. The pipeline is deployed as a hosted REST API with CML's Model Metrics feature enabled to track each prediction with a managed Postgres database for later analysis.
  5. A simulation is run in that iterates over the production dataset in monthly batches. For each new month of production data (of which there are six total), the simulation will:
    • Lookup newly listed properties from the batch and predict their sale prices using the deployed model
    • Lookup newly sold properties from the batch and track their ground truth values by joining to original prediction record in the metric store
    • Calculate drift metrics and deploy a set of refreshed Evidently monitoring reports via a CML Application

Upon succesful recreation of the project (which may take ~20 minutes), the simulation will have produced 6 sets of monitoring reports - three for each month of "production" records - and saved those reports to the apps/static/reports/ directory. Reports should be accessed via the custom dashboard running as an Application in CML and can be used to determine if and where drift is occuring within each new batch of data. We encourage users to peruse the simulation logic and documentation directly for a detailed look at how new records are scored, logged, and queried to generate monitoring reports. Since the simulation is intended to mimic a production scenario, the deployed application is refreshed in-place with results from each new batch of data.

Launching the Project on CML

This AMP was developed against Python 3.6. There are two ways to launch the project on CML:

  1. From Prototype Catalog - Navigate to the AMPs tab on a CML workspace, select the "Continuous Model Monitoring" tile, click "Launch as Project", click "Configure Project"
  2. As an AMP - In a CML workspace, click "New Project", add a Project Name, select "AMPs" as the Initial Setup option, copy in this repo URL, click "Create Project", click "Configure Project"

cml_amp_continuous_model_monitoring's People

Contributors

andrewrreed avatar dependabot[bot] avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.