GithubHelp home page GithubHelp logo

azure / mlops-v2 Goto Github PK

View Code? Open in Web Editor NEW
460.0 32.0 222.0 18.52 MB

Azure MLOps (v2) solution accelerators. Enterprise ready templates to deploy your machine learning models on the Azure Platform.

Home Page: https://learn.microsoft.com/en-us/azure/machine-learning/concept-model-management-and-deployment

License: MIT License

Shell 100.00%
azure azuremachinelearning azureml deep-learning devops machine-learning microsoft mlops mlops-workflow mlops-environment

mlops-v2's Introduction

Azure MLOps (v2) Solution Accelerator

Header

Welcome to the MLOps (v2) solution accelerator repository! This project is intended to serve as the starting point for MLOps implementation in Azure.

MLOps is a set of repeatable, automated, and collaborative workflows with best practices that empower teams of ML professionals to quickly and easily get their machine learning models deployed into production. You can learn more about MLOps here:

Project overview

The solution accelerator provides a modular end-to-end approach for MLOps in Azure based on pattern architectures. As each organization is unique, solutions will often need to be customized to fit the organization's needs.

The solution accelerator goals are:

  • Simplicity
  • Modularity
  • Repeatability & Security
  • Collaboration
  • Enterprise readiness

It accomplishes these goals with a template-based approach for end-to-end data science, driving operational efficiency at each stage. You should be able to get up and running with the solution accelerator in a few hours.

Prerequisites

  1. An Azure subscription. If you don't have an Azure subscription, create a free account before you begin.
    • Important - If you use either a Free/Trial, or similar learning purpose subscriptions like Visual Studio Premium with MSDN, some provisioning tasks might not run as expected due to limitations imposed on 'Usage + quotas' on your subscription. To help you succeed, we have provided specific instructions before provisioning throughout the guide, and you are highly advised to read those instructions carefully
  2. For Azure DevOps-based deployments and projects:
  3. For GitHub-based deployments and projects:
  4. Git bash, WSL, or another shell script editor on your local machine

Documentation

  1. Solution Accelerator Concepts and Structure - Philosophy and organization
  2. Architectural Patterns - Supported Machine Learning patterns
  3. Accelerator Deployment Guides - How to deploy and use the soluation accelerator with Azure DevOps or GitHub
  4. Quickstarts - Precreated project scenarios for demos/POCs. Azure DevOps ADO Quickstart.
  5. YouTube Videos: Deploy MLOps on Azure in Less Than an Hour and AI Show

Contributing

This project welcomes contributions and suggestions. To learn more visit the contributing section, see CONTRIBUTING.md for details.

Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

mlops-v2's People

Contributors

akritim avatar azeltov avatar chrey-gh avatar cindyweng avatar drosevear avatar jawadaminmsft avatar lostmygithubaccount avatar manu-kanwarpal avatar mariamedp avatar microsoft-github-policy-service[bot] avatar microsoftopensource avatar msteller-ai avatar ncostar avatar nicoleserafino avatar sdonohoo avatar setuc avatar strugdt avatar timschps avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mlops-v2's Issues

[QuickStart.md] Error when running deploy-model-training-pipeline with enable_aml_secure_workspace: true

Hey,

After infra deploy "tf-ado-deploy-infra.yml" with "enable_aml_secure_workspace: true" did you test "deploy-model-training-pipeline" in Azure DevOps?

Stack:

Screenshot 2022-09-19 at 12 40 58

  1. My conclusion is that the stack is as expected because public agent is used
    pool:
    vmImage: $(ap_vm_image)
    where ap_vm_image: ubuntu-20.04.
    to perform actions in Azure Machine Learning Workspace (ex. pipeline.publish(config['training_pipeline_name'])). In case you are able to perform this action on your side isn't it a security issue? Do you expect to add self hosted agent deployment and configuration to tf ?

  2. Additionally, don't you need "Machine Learning Workspace" scoped service principal to perform AML Workspace actions? Should instructed Azure-ARM-Prod (subscription level type) be able to preform any actions?

Model Promotion

I wasnt able to find in the code or in the documentation anything related to model promotion and data drift.

If I have a daily ingestion of new data and I run the training pipeline on a daily basis, probably at some point the daily model will have better metrics that the one deployed.

How can I replace the model ONLY if the metrics are better? Could you provide some guidance or links if it exists?

Job naming issues with AML CLI training pipeline

Why?

Currently when we deploy training pipeline using AML CLI we run into the problem that job names could not be validated.

image

How?

As we can see in the screenshot above that hyphens ain't allowed in the naming conventions for the jobs. So we should consider repacing hyphens with underscores in pipeline.yml file.

I could also create a PR if you consider this as a valid way of solving this issue.

Anything else?

[QUICKSTART.md] pipeline build failing when creating compute cluster.

Describe the bug or the issue that you are facing

Outer Loop: Deploying Infrastructure via Azure DevOps

Error: waiting for creation of Compute Cluster "cpu-cluster" in workspace "mlw-mlopsv2-0769dev" Code="BadRequest" Message=The specified subscription has a total vCPU quota of 0 and cannot accomodate for at least 1 requested managed compute node which maps to 2 vCPUs.

My guess is as BatchAI is deprecated, it cannot allocate the cpu cluster. I have also tried changing regions and vmsize. None of them seems to work. Or it could be my subscription 'Azure for Students' not having the quota. If it is because of my subscription quota, is there a way around it?

Screenshot 2022-12-26 at 14 46 22

Steps/Code to Reproduce

Run the pipeline as usual in region eastus.

Expected Output

Demo running as intended

Versions

Azure DevOps
Terraform
Azure ML CLI

Which platform are you using for deploying your infrastrucutre?

Azure DevOps (ADO)

If you mentioned Others, please mention which platformm are you using?

No response

What are you using for deploying your infrastrucutre?

Terraform

Are you using Azure ML CLI v2 or Azure ML Python SDK v2

Azure ML CLI v2

Describe the example that you are trying to run?

Classical ML provided with the repo

Batch Endpoint - AutoML

Hi,

We opted for automated ML , ran the job and registered the model in dev workspace
Now we want to deploy it in test and prod workspace. We have downloaded the .pkl file, scoring script, conda env file which are all required for deployment.

image

But it seems the scoring script generated is applicable only for online deployment. So whether batch deployment is not possible in automated ml?

Thanks

The operation was canceled

When I run Outer Loop: Deploying Infrastructure via Azure DevOps pip line Azure DevOps give me this error

2022-08-04T07:59:48.3348777Z [command]/opt/hostedtoolcache/terraform/0.14.7/x64/terraform plan -var location=westus -var prefix=mlopsv2 -var postfix=1699 -var environment=prod -var enable_aml_computecluster=true -detailed-exitcode
2022-08-04T07:59:50.9124061Z Acquiring state lock. This may take a few moments...
2022-08-04T07:59:52.1053307Z �[0m�[1mvar.client_secret�[0m
2022-08-04T07:59:52.1054837Z Service Principal Secret
2022-08-04T07:59:52.1055171Z
2022-08-04T08:59:23.3030923Z �[1mEnter a value:�[0m �[0m
2022-08-04T08:59:23.3501608Z ##[error]The operation was canceled.
2022-08-04T08:59:23.3508829Z ##[section]Finishing: Terraform plan

Correct Deployment of Dev Environments

Hello,

Thank you for this repo, and all your hard work maintaining it. I am trying to deploy the quickstart guide - I have successfully been able to deploy the outer loop prod environment using Azure DevOps.

However, my understanding was that I would end up with a rg-mlopsv2-XXXprod and an rg-mlopsv2-XXXdev - an area to do the dev work on the project, and an area to host the production model (in the future, I want to expand to also contain a UAT env). The pipelines would allow me to move models between the two. However, after running the tf-ado-deploy-infra.yml pipeline I only have the prod environment. I tried creating a new branch, called dev, and reran the pipeline. However, now, during the Terraform Init step I get an error:

##[error]Error: There was an error when attempting to execute the process '/opt/hostedtoolcache/terraform/0.14.7/x64/terraform'. This may indicate the process failed to start. Error: spawn /opt/hostedtoolcache/terraform/0.14.7/x64/terraform ENOENT

The guide always mentions staying on main, and doesn't mention creating a dev environment. Have I conceptually misunderstood what architecture (ie, there shouldn't be a rg-mlopsv2-XXXdev), or, if I need to create this env how do I overcome my error?

How to achieve Continuous Integration

Hi,

As per the process, we are training the and registering the model in DEV Workspace. But how to bring/ deploy that model in Stage.

It would be better if we have the steps between registering the model in dev workspace and deploy in Stage Workspace

How to stop the instance in online deployment when not required

Hi,

In online-deployment config file, we are mentioning the instance type and instance count

image

But where the instance is actually created? We tried to locate in compute session of ml workspace but could not find it

And the minimum value of instance count is 1 [ Tried making it 0 and enabled scale settings but we are getting the error]. So the instance is always running which is incurring us the additional cost when not in use

So please suggest a way to stop the instance/ additional cost when not using

Thanks,
M.Murugeswari

Quickstart.md - Job: Create Bicep Deployment issue with the storage account name

Why?

While running pipeline with the Create Bicep Deployment Job we will get the error: {'code': 'StorageAccountAlreadyTaken', 'target': 'stmlopsv2819prod', 'message': 'The storage account named stmlopsv2819prod is already taken.'}

I believe storage account names are unique and since this was already executed I can't create a storage account with the same name.

How?

I believe in the both files config-infra-dev.yml and config-infra-prod.yml this line needs to change: st$(namespace)$(postfix)$(environment)

and we might have to add some initials of our company, subscription or even from our name

Thank you,
Carla

QUICKSTART.md: Deploy AML Workspace - Terraform Apply Failed - Resource Group Name Already Exists

When I run the ADO Pipeline (Section: Outer Loop: Deploying Infrastructure via Azure DevOps, Step 8: Run the pipeline), I kept getting resource group already exists errors, even though I have changed names to be unique several times (also each time deleted accordingly those resource groups in Azure portal). Please help. Thanks a lot!

I put the error below:

Error: A resource with the ID "/subscriptions/7b60df94-2d4a-463d-9468-5ed1cce920b2/resourceGroups/rg-mlopsv2hez-0919prod" already exists - to be managed via Terraform this resource needs to be imported into the State. Please see the resource documentation for "azurerm_resource_group" for more information.

on modules/resource-group/main.tf line 1, in resource "azurerm_resource_group" "adl_rg":
1: resource "azurerm_resource_group" "adl_rg" {

Releasing state lock. This may take a few moments...
##[error]Error: The process '/opt/hostedtoolcache/terraform/0.14.7/x64/terraform' failed with exit code 1
Finishing: Terraform apply


Unable to register the model as mlflow model

Hi,

We ran the model training pipeline and registered the model as mlflow model in dev workspace

image

We then used release pipeline, to register the model in test workspace. Kindly find the below command

image

But the model is registering as custom model

image

This is working fine before, but now it is registering as custom model only. Please let me know the solution

[repo] Automate service principal setup?

Why?

Currently the QUICKSTART.md includes a lengthy setup through the GUI for creating a couple Service Principals for use later. Can't this be automated with the Azure CLI? Then this section can simply be to run a script or can be included in the infrastructure setup.

How?

Probably a series of az ad sp commands. Should be idempotent.

Anything else?

A user also needs to ensure they're using the same names. All of this should go in a configuration file and get used throughout the project. @dkmiller may have some thoughts here too, perhaps that hydra tool would be good here?

[repo] ARM template within the bicep folders

Why?

Incorrect files in location

How?

Currently there is a main.json ARM template within infrastructure/bicep/main.json, this should be removed (if was being used to create the bicep) or moved to infrastructure/arm which is currently empty

Getting error while creating Responsible Ai dashboard

I am trying to use different algorithms for training. Model training and model registration part is successfully completed, but RAI insights dashboard constructor is failed. I am using the xgboost model for training.

Traceback (most recent call last):
  File "create_rai_insights.py", line 184, in <module>
    main(args)
  File "create_rai_insights.py", line 128, in main
    model_estimator = load_mlflow_model(my_run.experiment.workspace, model_id=model_id)
  File "/mnt/azureml/cr/j/exe/wd/rai_component_utilities.py", line 79, in load_mlflow_model
    return mlflow.pyfunc.load_model(model_uri)._model_impl
  File "/azureml-envs/responsibleai-0.18/lib/python3.8/site-packages/mlflow/pyfunc/__init__.py", line 484, in load_model
    model_impl = importlib.import_module(conf[MAIN])._load_pyfunc(data_path)
  File "/azureml-envs/responsibleai-0.18/lib/python3.8/site-packages/mlflow/sklearn/__init__.py", line 494, in _load_pyfunc
    return _load_model_from_local_file(path=path, serialization_format=serialization_format)
  File "/azureml-envs/responsibleai-0.18/lib/python3.8/site-packages/mlflow/sklearn/__init__.py", line 452, in _load_model_from_local_file
    return cloudpickle.load(f)
ModuleNotFoundError: No module named 'xgboost'

Thank You

Authorisation problem when deploying training pipeline

Hello,

I have been following your quick start guide, and have got to the stage where I need to deploy the pipeline "deploy-model-training-pipeline.yml" on Azure DevOps.

When I run this, it goes as far as the Run pipeline in AML step in DevOps, then I get this error:

If there is an Authorization error, check your Azure KeyVault secret named kvmonitoringspkey. Terraform might put single quotation marks around the secret. Remove the single quotes and the secret should work.
.create table mlmonitoring (['Sno']: int, ['Age']: int, ['Sex']: string, ['Job']: int, ['Housing']: string, ['Saving accounts']: string, ['Checking account']: string, ['Credit amount']: int, ['Duration']: int, ['Purpose']: string, ['Risk']: string, ['timestamp']: datetime)
Cleaning up all outstanding Run operations, waiting 300.0 seconds
2 items cleaning up...
Cleanup took 0.08368873596191406 seconds
Traceback (most recent call last):
  File "/azureml-envs/XXXXXX/lib/python3.7/site-packages/azure/kusto/data/security.py", line 68, in acquire_authorization_header
    return _get_header_from_dict(self.token_provider.get_token())
  File "/azureml-envs/XXXXXX/lib/python3.7/site-packages/azure/kusto/data/_token_providers.py", line 123, in get_token
    token = self._get_token_impl()
  File "/azureml-envs/XXXXXX/lib/python3.7/site-packages/azure/kusto/data/_token_providers.py", line 554, in _get_token_impl
    return self._valid_token_or_throw(token)
  File "/azureml-envs/XXXXXXX/lib/python3.7/site-packages/azure/kusto/data/_token_providers.py", line 201, in _valid_token_or_throw
    raise KustoClientError(message)
azure.kusto.data.exceptions.KustoClientError: ApplicationKeyTokenProvider - failed to obtain a token. 
invalid_client
AADSTS7000215: Invalid client secret provided. Ensure the secret being sent in the request is the client secret value, not the client secret ID, for a secret added to app 'XXXXX'.
Trace ID: XXXXXX
Correlation ID: XXXXXXX
Timestamp: 2022-11-01 15:46:31Z

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "prep.py", line 83, in <module>
    main()
  File "prep.py", line 80, in main
    log_training_data(df, args.table_name)
  File "prep.py", line 35, in log_training_data
    collector.batch_collect(df)
  File "/azureml-envs/XXXXX/lib/python3.7/site-packages/obs/collector.py", line 158, in batch_collect
    self.create_table_and_mapping()
  File "/azureml-envs/XXXXX/lib/python3.7/site-packages/obs/collector.py", line 132, in create_table_and_mapping
    self.kusto_client.execute_mgmt(self.database_name, CREATE_TABLE_COMMAND)
  File "/azureml-envs/XXXXX/lib/python3.7/site-packages/azure/kusto/data/client.py", line 891, in execute_mgmt
    return self._execute(self._mgmt_endpoint, database, query, None, self._mgmt_default_timeout, properties)
  File "/azureml-envs/XXXXX/lib/python3.7/site-packages/azure/kusto/data/client.py", line 959, in _execute
    request_headers["Authorization"] = self._aad_helper.acquire_authorization_header()
  File "/azureml-envs/XXXXX/lib/python3.7/site-packages/azure/kusto/data/security.py", line 72, in acquire_authorization_header
    raise KustoAuthenticationError(self.token_provider.name(), error, **kwargs)
azure.kusto.data.exceptions.KustoAuthenticationError: KustoAuthenticationError('ApplicationKeyTokenProvider', 'KustoClientError("ApplicationKeyTokenProvider - failed to obtain a token. \ninvalid_client\nAADSTS7000215: Invalid client secret provided. Ensure the secret being sent in the request is the client secret value, not the client secret ID, for a secret added to app 'XXXXXXX'.\r\nTrace ID: XXXXXXXX\r\nTimestamp: 2022-11-01 15:46:31Z")', '{'authority': 'XXXXXX', 'client_id': 'XXXXX', 'kusto_uri': 'https://adxmlopsv286309prod.uksouth.kusto.windows.net'}')

I have done some investigating:

  • It appears my Prod Service Principal is set up correctly. When I go to Project Settings > Service Connections > Azure-ARM-Prod > Edit, there is an option to verify the connection. It works here.
  • When I check my App Registrations and investigate the "Certificates and Secrets tab" of "Azure-ARM-Prod-mlops-sparse" there is indeed a secret in here. (The terraform pipeline to create the Prod infrastructure works, so I am lead to believe that my SPs are working correctly.)
  • When I go to my Prod Key Vault, there is a secret called kvmonitoringspkey - looking at this though the Secret Value is just $(CLIENT_SECRET) - is it meant to be this? If so, why? And where was is set to this?

Do you have any advice on how I can fix this error?

[repo] Depoly Batch Scoring pipeline - Test is failing

What?

When running deployment of batch inferencing pipeline from template https://github.com/Azure/mlops-templates/blob/main/templates/aml-cli-v2/test-deployment.yml

Part when it is testing the newly created endpoint is failing on missing argument:

https://github.com/Azure/mlops-templates/blob/84aff2f4e13c5bc0b86477118cf5279fac3cfaf9/templates/aml-cli-v2/test-deployment.yml#L7-L19

ERROR: the following arguments are required: --input

by docs the --input is mandatory - maybe recent change (https://docs.microsoft.com/en-us/cli/azure/ml/batch-endpoint?view=azure-cli-latest#az-ml-batch-endpoint-invoke)

issues with mlops-v2/sparse_checkout.sh

I bumped into the following issues while running mlops-v2/sparse_checkout.sh. I worked around them because i knew what the cmds were trying to accomplish.

Issue 1:

workaroud - used this cmd (not sure why above doesn't work)
git clone https://github.com/azure/mlops-project-template --depth 1 --filter=blob:none MLOps711Demo

Issue 2:

  • git init -b main
    error: unknown switch `b'
    usage: git init [-q | --quiet] [--bare] [--template=] [--shared[=]] []

    --template
    directory from which templates will be used
    --bare create a bare repository
    --shared[=]
    specify that the git repository is to be shared amongst several users
    -q, --quiet be quiet
    --separate-git-dir
    separate git dir from working tree

Workaound: first inited the repo and then renamed branch from master to main

git init
git add . && git commit -m 'initial commit'
git branch -m master main

Quickstart.md - Job: Create Bicep Deployment issue with does not have authorization to perform action

WHY?
While running pipeline with the Create Bicep Deployment Job we will get the error:

ERROR: {'code': 'AuthorizationFailed', 'message': "The client '0ddf4906-d1f2-4d9f-a900-91af50dba0ec' with object id '0ddf4906-d1f2-4d9f-a900-91af50dba0ec' does not have authorization to perform action 'Microsoft.Resources/deployments/validate/action' over scope '/subscriptions/f1af7a1f-db16-4c85-b133-e7c3c1f094cc' or the scope is invalid. If access was recently granted, please refresh your credentials."}
##[error]Script failed with exit code: 1

image

Misleading Architecture Diagram & README.md

I think that the wording of the documentation and the architecture diagram are misleading.

"Continuous Deployment (VD) pipelines manage the promotion of the model and related assets through production"

In this accelerator you are not deploying the model between the different environments, you are promoting the training pipeline. Azure ML Studio doesn't currently support sharing resources across Workspaces (though this is being rectified in https://github.com/Azure/azureml-previews/tree/main/previews/registries).

Could the documentation be updated to make it clear that you are deploying the inner loop in each of the environments.

This confusion is seen in other issues such as #36

[QuickStart.md] Error when running deploy-model-training-pipeline due to container security

When running the deploy-model-training-pipeline.yml [QuickStart.md - Step "Inner Loop: Deploying Classical ML Model Development / Moving to Test Environment"] in DevOps I receive the following container security error:

##[warning]cv/aml-cli-v2/data-science/environment/Dockerfile - Container usage from external registry 'nvcr.io' found.
##[error]Container security analysis found 1 violations. This repo has one or more docker files having references to images from external registries. Please review https://aka.ms/containers-security-guidance to remove the reference of container images from external registries. Please reach out via teams (https://aka.ms/cssc-teams) or email ([email protected]) for any questions or clarifications.

I assume the problem is either this docker image here:
https://github.com/Azure/mlops-project-template/blob/62cd04cb283fb46580558e17117ed701f90dfcbe/classical/aml-cli-v2/mlops/azureml/train/train-env.yml#L3

Or this docker image:
https://github.com/Azure/mlops-project-template/blob/62cd04cb283fb46580558e17117ed701f90dfcbe/cv/aml-cli-v2/data-science/environment/Dockerfile#L2

When working with Conda, there are no checks on the environment packages

Describe the bug or the issue that you are facing

When creating the environments with conda, there are no checks performed when creating the environment to see if the packages exist or if the pip dependencies are also correctly installed.

Steps/Code to Reproduce

When creating the environments with conda, there are no checks performed when creating the environment to see if the packages exist or if the pip dependencies are also correctly installed.

az ml environment create --file .\train-conda.yml

train-conda.yml

$schema: https://azuremlschemas.azureedge.net/latest/environment.schema.json
name: docker-image-plus-conda-example
image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04
conda_file: environment.yml
description: Environment created from a Docker image plus Conda environment.

environment.yml

channels:
  - defaults
  - anaconda
  - conda-forge
dependencies:
  - python=3.7.5
  - pip
  - pip:
      - azureml-mlflow==1.38.0
      - azureml-sdk==1.38.0
      - sklearn==0.24.1

Expected Output

No error is thrown even though there is no such package like sklearn.

There should be a check to make sure that the environment has been successfully created before moving on to the next steps.

Versions

running az -v
azure-cli 2.43.0

core 2.43.0
telemetry 1.0.8

Extensions:
account 0.2.5
ml 2.12.1

Dependencies:
msal 1.20.0
azure-mgmt-resource 21.1.0b1

Which platform are you using for deploying your infrastrucutre?

Azure DevOps (ADO)

If you mentioned Others, please mention which platformm are you using?

No response

What are you using for deploying your infrastrucutre?

Bicep

Are you using Azure ML CLI v2 or Azure ML Python SDK v2

Azure ML CLI v2

Describe the example that you are trying to run?

Tabular. but this affects all the environments.

[mlops-v2] Is the Quickstart tutorial limited to MS/Azure employees ?

Why?

This repo (mlops-v2) was recommended to us (MS/Azure Client) by our MS business contact to try out Azure MLOps. However, it looks like some steps require specific organization access.

How?

  • Doing the Quickstart
  • Creating the PAT in the Developer Settings of my GH account works fine
  • However, step 2.7 (pictured bellow), isn't available to us for pretty obvious reasons (not part of Azure) :
    image

is there a way to go around this ? A public version of the tutorial/setup ?

looking forward to try it out !
Thanks in advance

how can i create inference clusters ?

Hi @setuc , I want to create inference cluster, i.e. Deploy model to the Azure Kubernetes Service for large scale inferencing. How can I create a compute inference cluster?

Thank you

GitHub Actions instead of DevOps

Do you have guide (similar to Quickstart) on how to do everything in GitHub rather than Azure DevOps - probably only the orchestration part is in Azure DevOps currently.

Moreover, currently referencing GitHub repos in Azure DevOps pipeline is considered as unsafe:
image

@msteller-Ai

[Quickstart.md] Deploying Infrastructure via Azure DevOps

Hi,
I am facing an issue with the deployment via Azure Pipeline. When I run the pipeline, the following error is thrown.

Screenshot_20221130_192350

It seemed like I had to change the resource repositories and connections. So, I tried changing it to the following and another error is thrown.

Screenshot_20221130_192212

Can anyone help me with the issue?

Thanks,
Saiham

Model tagging as Legacy

Hi,

The model which has been trained and registered is now showing as Legacy

image

We tried creating few new automl models. But those models were also created with the tag Legacy

Can you please let us know on what basis the model is getting tagged as legacy

Thanks

Load testing of endpoint in Azure ML and data sanity tests

I wanted to load test a deployed model endpoint and add it to an MLOps v2 framework, so that new deployed models can go through load testing. I tried Azure load testing resource for web app but I think it is just for static content. Also, I saw a python package - locust which seems promising. Is there any other approach?
Also, for data sanity and integration tests for deployed models - great_expectations is available package. Is there a Azure service or recommended package for the same?

Kubernetes issue - online deployment

Hi,

For taxi mlflow model usecase - online deployment, I tried using Kubernetes as compute resource

As per the Microsoft docs, created aks, configured extensions, attached in ml workspace. Then I tried online deployment via cli v2, where I am facing the below issues - "InferencingClientCallFailed"

image

When deploying via ui, for mlflow models also, it is asking to upload scoring script and environment [ which is not incase of managed instance]

image

So, whether we need to manually add / configure scoring script and environment details for mlflow models when we use Kubernetes?

Checkout mlops-project-template@cindy-test to s/mlops-project-template

Describe the bug or the issue that you are facing

Is it correct to use a branch different to main?
Starting: Checkout mlops-project-template@cindy-test to s/mlops-project-template

Steps/Code to Reproduce

Same as https://github.com/Azure/mlops-v2/blob/main/documentation/deployguides/deployguide_ado.md
Section: Set up source repository with Azure DevOps

Expected Output

I dont have an error per see, but its weird to me that its using a named branch (cindy-test) instead of main.

Versions

I am using: initialise-project.yml
from mlops-v2 repo

Which platform are you using for deploying your infrastrucutre?

Azure DevOps (ADO)

If you mentioned Others, please mention which platformm are you using?

No response

What are you using for deploying your infrastrucutre?

Pre-Deployed

Are you using Azure ML CLI v2 or Azure ML Python SDK v2

Azure ML Python SDK v2

Describe the example that you are trying to run?

I am just following the section:
Setup MLOps V2 and a New MLOps Project in Azure DevOps

from:
https://github.com/Azure/mlops-v2/blob/main/documentation/deployguides/deployguide_ado.md

We use ADO not GHA

Continuous Integration

I think Continuous Integration should be in the middle loop or outer loop (denpends on the definition of the loop). But currently Continous Integration dones't belong to any loops. Why?

[repo] <Fix versions of all libraries >

Why?

If you don't fix the version, if there's an upgrade from the library the pipeline may break.
Currently the install CLI action does not fix the version which means that Devops Agent will pick up the latest Azure ML CLI version while the job/pipeline definition still follow old version. This may break the pipeline.

How?

Anything else?

"DeployTrainingPipeline" throws error at the "Connect to AML Workspace" step

To make the infrastructure, I ran an Azure Pipeline using infrastructure/pipelines/tf-ado-deploy-infra.yml, and it worked. Then, used mlops/devops-pipelines/deploy-model-training-pipeline.yml to run the (training) Pipeline on the sample data provided, but it stops at the "Connect to AML Workspace" step with the following error:

Screenshot 2022-11-30 at 14 37 54

Then, checked the workspace that's created automatically by the Pipeline and noticed that I don't have access to the Jobs, while receiving a weird notification:

Screenshot 2022-11-30 at 14 27 01

Screenshot 2022-11-30 at 14 31 42

That is while I have a "Contributor" access to the subscription and also, all those secrets and App registrations, etc are made/done without any problem, based on the QuickStart tutorial.

Enterprise readiness where Github is not the source control repository

The current Quickstart doesn't cater for organisations that don't use GitHub as their source control repository. There's a need for these organisations to get started on Azure quickly – for MLOps projects to be initialised in Azure DevOps itself based on 'MLOps version' and 'project type'.

Expand to add AzureRepos not just GitHub

Would it be possible to expand example to use Azure Repos as well and not just GitHub? We don't use GitHub at all, but are very interested in the MLOps accelerator.

Python SDK issue with azure-cli version 2.29

Hey together,

first of all thanks alot for the great tool, i am really enjoing version 2 of the mlops. It is very powerful and really helps to speed up the mlops process for customers.

There is some version problem i am currently experiencing with the python-sdk.

image

How to replicate the problem :

  1. Run the sparse_check.sh to create a new classical with the python sdk
  2. Run the training pipeline

What causes the problem ?

The problem is that the new version of the az ml cli is not working with azure-cli version 2.29 even longer, it needs at least version 2.30.

Where is the problem located ?

Currently the version 2.29 of the azure-cli is hardcoded in the mlops-template file (install-az-cli.yaml)

What did i try to fix it ?

i changed the version in the yaml file to the desired version 2.30, but then i ran into the next problem.
After increasing the version the method Workspace.from_config() method wanted me to use a DeviceLogin inside the Pipeline.
That indicates, that it does not use the AzCli Authentication somehow.

Screenshot after increasing the version :

image

Some help would be apreciated, i will also keep trying to fix it :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.