GithubHelp home page GithubHelp logo

microsoft / dstoolkit-mlops-databricks Goto Github PK

View Code? Open in Web Editor NEW
66.0 20.0 56.0 84.14 MB

ML Ops Accelerator: Databricks & Azure Machine Learning Unification

License: MIT License

Python 75.89% Bicep 6.82% Shell 12.16% Jupyter Notebook 3.94% PowerShell 1.20%
databricks azure mlops mlflow databricks-feature-store machine-learning cicd azure-machine-learning

dstoolkit-mlops-databricks's Introduction

Banner

MLOps Gify Visual (Allow 20 Seconds To Load)



Version History And Updates

Azure ML Integration Now Live For GitHub Deployment.

Development for Azure DevOps Deployment in Progress

MLOps for Databricks with CI/CD (GitHub Actions)


MLOps Architecture

image

Features to be included in future releases:

  • Model testing
  • Metrics & Monitoring

Youtube Demo - Slightly Outdated

The deployment instructions for the video are slightly outdated (albeit still usefull). Please follow instructions below instead. The video still provides useful content for concepts outwith the deployment.

Youtube Demo


About This Repository

This Repository contains an Azure Databricks Continuous Deployment and Continuous Development Framework for delivering Data Engineering/Machine Learning projects based on the below Azure Technologies:

Azure Databricks Azure Log Analytics Azure Monitor Service Azure Key Vault

Azure Databricks is a powerful technology, used by Data Engineers and Scientists ubiquitously. However, operationalizing it within a Continuous Integration and Deployment setup that is fully automated, may prove challenging.

The net effect is a disproportionate amount of the Data Scientist/Engineers time contemplating DevOps matters. This Repository's guiding vision is to automate as much of the infrastructure as possible.



Prerequisites

Click Dropdown...
  • Github Account
  • Microsoft Azure Subscription
  • VS Code
  • Azure CLI Installed (This Accelerator is tested on version 2.39)


Details of The Solution Accelerator

  • Creation of four environments:
    • Sandbox
    • Development
    • User Acceptance Testing (UAT)
    • Production
  • Full CI/CD between environments
  • Infrastructure-as-Code for interacting with Databricks API and also CLI
  • Azure Service Principal Authentication
  • Azure resource deployment using BICEP
  • Databricks Feature Store + MLFlow Tracking + Model Registry + Model Experiments
  • DBX by Data Labs for Continuous Deployment of Jobs/Workflows (source code/ parameters files packaged within DBFS)

Deployment Instructions

Create Repository

Click Dropdown...
  • Fork this repository here
  • In your Forked Repo, click on 'Actions' and then 'Enable'
  • Within your VS Code click, "View", then "Command Pallette", "Git: Clone", and finally select your Repo

Login To Azure

  • All Code Throughout To Go Into VS Code PowerShell Terminal
az login

# If There Are Multiple Tenants In Your Subscription, Ensure You Specify The Correct Tenant "az login --tenant"

# ** Microsoft Employees Use: az login --tenant fdpo.onmicrosoft.com (New Non Prod Tenant )

GitHub Account

echo "Enter Your Git Username... "
# Example: "Ciaran28"
$Git_Configuration = "GitHub_Username"

GitHub Repos Within Databricks

echo "Enter Your Git Repo Url (this could be any Repository In Your Account )... "
# Example: "https://github.com/ciaran28/dstoolkit-mlops-databricks" 
$Repo_ConfigurationURL = ""

Updates Parameter Files & Git Push To Remote

echo "From root execute... "

./setup.ps1


Create Environments

Follow the naming convention (case sensitive) image

Secrets

For each environment create GitHub Secrets entitled ARM_CLIENT_ID, ARM_CLIENT_SECRET and ARM_TENANT_ID using the output in VS Code PowerShell Terminal from previous step. (Note: The Service Principal below was destroyed, and therefore the credentials are useless )

image

In addition generate a GitHub Personal Access Token and use it to create a secret named PAT_GITHUB:

image



Final Snapshot of GitHub Secrets

Secrets in GitHub should look exactly like below. The secrets are case sensitive, therefore be very cautious when creating.

image



Deploy The Azure Environments

  • In GitHub you can manually run the pipeline to deploy the environments to Azure using "onDeploy.yaml" found here. Use the instructions below to run the workflow.

image

  • Azure Resources created (Production Environment snapshot - For speed I have hashed out all environment deployments except Sandbox. Update onDeploy.yaml to deploy all environments)

image



Running Pipelines

  • The end to end machine learning pipleine will be pre-configured in the "workflows" section in databricks. This utilises a Job Cluster which will automatically upload the necessary dependencies contained within a python wheel file

  • If you wish to run the machine learning scripts from the Notebook instead, first upload the dependencies (automatic upload is in development). Simply navigate to python wheel file contained within the dist/ folder. Manually upload the python wheel file to the cluster that you wish to run for the Notebook.



Continuous Deployment And Branching Strategy

The Branching Strategy I have chosen is configured automatically as part of the accelerator. It follows a GitHub Flow paradigm in order to facilitate rapid Continuous Integration, with some nuances. (see Footnote 1 which contains the SST Git Flow Article written by Willie Ahlers for the Data Science Toolkit - This provides a narrative explaining the numbers below)[^1]

The branching strategy is easy to change via updating the "if conditions" within .github/workflows/onRelease.yaml.

image

  • Pull Request from Feature Branch to Main Branch: C.I Tests
  • Pull Request approved from Feature Branch to Main Branch: C.D. to Development Environment
  • Pull Request from Main Branch to Release Branch: C.I. Test
  • Pull Request approved from Main Branch to Release Branch: C.D. to User Acceptance Testing (UAT) Environment
  • Tag Version and Push to Release Branch: C.D. to Production Environment
  • Naming conventions for branches (to ensure the CD pipelines will deploy - onRelease.yaml for more details ):
    • Feature Branches: "feature/"
    • Main Branch: "main"
    • Release branch "release/"


MLOps Paradigm: Deploy Code, not Models

In most situations, Databricks recommends that during the ML development process, you promote code, rather than models, from one environment to the next. Moving project assets this way ensures that all code in the ML development process goes through the same code review and integration testing processes. It also ensures that the production version of the model is trained on production code. For a more detailed discussion of the options and trade-offs, see Model deployment patterns.

https://learn.microsoft.com/en-us/azure/databricks/machine-learning/mlops/deployment-patterns

image



Feature Store Integration

In an organization, thousands of features are buried in different scripts and in different formats; they are not captured, organized, or preserved, and thus cannot be reused and leveraged by teams other than those who generated them.

Because feature engineering is so important for machine learning models and features cannot be shared, data scientists must duplicate their feature engineering efforts across teams.

To solve those problems, a concept called feature store was developed, so that:

  • Features are centralized in an organization and can be reused
  • Features can be served in real-time with low latency

image

dstoolkit-mlops-databricks's People

Contributors

ciaran28 avatar clintgrove avatar lovinggracem avatar microsoft-github-operations[bot] avatar microsoftopensource avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dstoolkit-mlops-databricks's Issues

Python Wheel File For Local Cluster

At present, you can deploy the machine learning scripts from the databricks workflows, or choose to deploy them on the databricks interactive cluster.

The workflows have the python wheel files attached as standard.

However, the python wheel file is not uploaded onto the interactive databricks cluster "ml_cluster"

How to troubleshoot failure in DBX Deploy - Workflow Artifacts - 401

Hello, Thanks for providing this accelerator. I am seeing a failure in deploy of BBX workflow artifacts.

Error text:

HTTPError: 401 Client Error: Unauthorized for url: 
https://adb-2652455892987519.19.azuredatabricks.net/api/2.0/workspace/mkdirs

What is the best way for me to debug? I am fairly new to Azure but understand cloud in general. Happy to provide any other details.

Is there something I need to do to enable the databricks API

My deploys continue to fail in building the clusters with errors like:

HTTPError: 401 Client Error: Unauthorized for url: 
https://adb-3459894219904510.10.azuredatabricks.net/api/2.0/workspace/mkdirs
Error: Process completed with exit code 1

If I try to access that url directly I also get a 401 error logged in as my AD user. BUT, if I go to
https://adb-3459894219904510.10.azuredatabricks.net/ I get the databricks instance page. Both the service principle and my ad user have Owner permissions to the databricks instance.

Is there something I need to do to provision the API?

keyvaults are created in UK South

I have been trying to debug the API failing due to bad PATs. I just noticed noticed that in all of the databricks resource groups, everything is in US East except the vaults. Is this correct?
image

AKV10032: Invalid issuer error

Tenant ID had been hard coded in Key Vault Bicep Templates.

Solution: Replace hard coded string with "subscription().tenantId"

First Time Creation of Databricks Custom Role

Concurrent Jobs will initially see no Role, and create it. When creating it, an error will follow if a current job has created the Role in the interim.

Short term fix is to run the pipeline again when it fails (first time).

Long term fix - remove Role Creation from the concurrent pipelines

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.