microsoft / mcw-ml-ops Goto Github PK

MCW MLOps

License: MIT License

Jupyter Notebook 100.00%

mcw-ml-ops's Introduction

MLOps

This workshop is archived and no longer being maintained. Content is read-only.

Wide World Importers (WWI) delivers innovative solutions for manufacturers. They specialize in identifying and solving problems for manufacturers that can run the range from automation, to providing cutting edge approaches that generate new opportunities. WWI has decades specializing in data science and application development that until now were separate units. They would like to unlock the greater, long-term value by combining the two units into one and follow one standardized process for operationalizing their innovations.

For this first proof of concept (PoC), WWI are looking to leverage Deep Learning technologies with Natural Language Processing (NLP) techniques to scan through vehicle component descriptions to find compliance issues with new regulations. The component descriptions are managed via a web application, and the web application takes the description and labels the component as compliant or non-compliant using the trained model. As part of this PoC, they want to ensure the overall process they create enables them to update both the underlying machine learning model and the web app in one, unified pipeline. They also want to be able to monitor the model's performance after it is deployed so they can be proactive with performance issues.

March 2022

Target audience

Data Scientists
App Developers
AI Engineers
DevOps Engineers

Abstracts

Workshop

In this workshop, you will learn how Wide World Importers (WWI) can leverage Deep Learning technologies to scan through their vehicle specification documents to find compliance issues with new regulations and manage the classification through their web application. The entire process from model creation, application packaging, model deployment and application deployment needs to occur as one unified repeatable, pipeline.

At the end of this workshop, you will be better able to design and implement end-to-end solutions that fully operationalize deep learning models, inclusive of all application components that depend on the model.

Whiteboard design session

In this whiteboard design session, you will work in a group to design a process Wide World Importers (WWI) can follow for orchestrating and deploying updates to the application and the deep learning model in a unified way. You will learn how WWI can leverage Deep Learning technologies to scan through their vehicle specification documents to find compliance issues with new regulations. You will standardize the model format to ONNX and observe how this simplifies inference runtime code, enabling pluggability of different models and targeting a broad range of runtime environments and most importantly improves inferencing speed over the native model. You will design a DevOps pipeline to coordinate retrieving the latest best model from the model registry, packaging the web application, deploying the web application, and inferencing web service. You will also learn how to monitor the model's performance after it is deployed so WWI can be proactive with performance issues.

At the end of this whiteboard design session, you will be better able to design end-to-end solutions that will fully operationalize deep learning models, inclusive of all application components that depend on the model.

Hands-on lab

In this hands-on lab, you will learn how Wide World Importers (WWI) can leverage Deep Learning technologies to scan through their vehicle specification documents to find compliance issues with new regulations. You will standardize the model format to ONNX and observe how this simplifies inference runtime code, enabling pluggability of different models and targeting a broad range of runtime environments and most importantly, improves inferencing speed over the native model. You will build a DevOps pipeline to coordinate retrieving the latest best model from the model registry, packaging the web application, deploying the web application, and inferencing web service. After a first successful deployment, you will make updates to both the model, the web application, and execute the pipeline once to achieve an updated deployment. You will also learn how to monitor the model's performance after it is deployed so WWI can be proactive with performance issues.

At the end of this hands-on lab, you will be better able to implement end-to-end solutions that fully operationalize deep learning models, inclusive of all application components that depend on the model.

Azure services and related products

Azure Container Instances
Azure DevOps
Azure Kubernetes Service
Azure Machine Learning Service
ML Ops
ONNX

Related references

Help & Support

We welcome feedback and comments from Microsoft SMEs & learning partners who deliver MCWs.

Having trouble?

First, verify you have followed all written lab instructions (including the Before the Hands-on lab document).
Next, submit an issue with a detailed description of the problem.
Do not submit pull requests. Our content authors will make all changes and submit pull requests for approval.

If you are planning to present a workshop, review and test the materials early! We recommend at least two weeks prior.

Please allow 5 - 10 business days for review and resolution of issues.

mcw-ml-ops's People

Contributors

Stargazers

Watchers

Forkers

timahenning shirolkar daovo cloudops-cherif fnaadb marcelaldecoa ceteongvanness ellerado misterreally peidyen 1konna cloudmelon 105306002 mabalija vmaruthi ciprianjichici raefey mddiallo praneet22 kokohar dipankar98228 lorenzpensaert robnotto roxanagoidaci vinitdoshi87 michaelcai311 keshavbans tiagoh swathimystery arashaga ebele25 spektrasystems nmoore-demo farhad-heybati robinaggarwal danieladolcan nohanaga bhaskers-blu-org2 pushkersahai taffywrinkle claudiusgonzo jay-jh-lee sithukyaw007 johanesalxd dweissenborn trial-data xctpro san-forks szelor testjiho ggeop sjuratov micseb turretin captainquirky mariuspc juancforero sarapi12 bertaboschvidal rogroc p-pham anirbansaha96 terrychang1015 msttyagi chinmayapadhi element824 annielytix tcmle edaviles cloudlabs-mcw vhcg77 sruthivijayakumar derekshijd codess-aus ali-khawaja dnjohnstone minh-doan chung-coder aarthikasirajan msworkshop mafascio womencancode jtupitza-msft nansravn retracement aionic raylooi ranjitjc aadimanchanda petemessina dhinagaran-s mfzarates aazibak yoshihiro-matsumoto tgokal wiiki0807 ashishpatel26 whoiscnu aqib-ahmedj jx-mlopslabs

mcw-ml-ops's Issues

Pipeline Run failed when it tried to install the CLI

ERROR: command authoring error for 'account set': 'subscription' conflicting option string: --subscription

experiment failing in exercise 1

While running experiment in deep learning with text , experiment is getting failed
please find the logs file
70_driver_log.txt

Build pipeline fails while creating image

Followed instructions but Azure Pipeline fails in Master Pipeline activity while creating Model Image. Log attached.[
build.log
](url)

Framework: TFKeras is not available to select

In Task 2 step 2 - https://github.com/microsoft/MCW-ML-Ops/blob/master/Hands-on%20lab/HOL%20step-by%20step%20-%20MLOps.md#task-2-register-model-from-azure-machine-learning-studio
While registering model we are not getting option for selecting framework:TFKeras, as TFKeras is a part of tensor flow we selected TenserFlow and it worked

AttributeError when importing Keras

Azure - Azure trial subscription

Issue :

AttributeError  

Traceback (most recent call last)

<ipython-input-3-9c78a7be919d> in <module>()
      5 
      6 import cv2
----> 7 import keras
      8 import numpy as np
      9 import matplotlib.pyplot as plt

8 frames

/usr/local/lib/python3.7/dist-packages/keras/initializers/__init__.py in populate_deserializable_objects()
     47 
     48   LOCAL.ALL_OBJECTS = {}
---> 49   LOCAL.GENERATED_WITH_V2 = tf.__internal__.tf2.enabled()
     50 
     51   # Compatibility aliases (need to exist in both V1 and V2).

AttributeError: module 'tensorflow.compat.v2.__internal__' has no attribute '__internal__'

File: Deep learning.ipynb

from azureml.widgets import RunDetails
from azureml.train.dnn import TensorFlow

import keras
import tensorflow
from keras.models import load_model

Same issue when converting the Keras model to ONNX

How I fixed it:

Referring to the answer provided here - https://stackoverflow.com/a/67787883

For the import issues, changed the imports to the following

import tensorflow
from tensorflow import keras
from tensorflow.keras.models import load_model
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

Regarding the Keras model to ONNX conversion:
I installed the latest version of onnxmltools, keras2onnx, onnxruntime, tf2onnx using
!pip install onnxmltools keras2onnx onnxruntime tf2onnx

This fixed the issue and helped me complete the workshop.

Deep Learning with Text.ipynb issue

When running the import statements towards the beginning of the notebook, get the error below.
Using compute instance created at 9:40AM EST june 9th 2020.
Python 3.6 - AzureML

---------- error -----
Using TensorFlow backend.

AttributeError Traceback (most recent call last)
in
18 from azureml.train.dnn import TensorFlow
19
---> 20 from keras.models import load_model
21 from keras.preprocessing.text import Tokenizer
22 from keras.preprocessing.sequence import pad_sequences

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/keras/init.py in
1 from future import absolute_import
2
----> 3 from . import utils
4 from . import activations
5 from . import applications

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/keras/utils/init.py in
4 from . import data_utils
5 from . import io_utils
----> 6 from . import conv_utils
7 from . import losses_utils
8 from . import metrics_utils

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/keras/utils/conv_utils.py in
7 from six.moves import range
8 import numpy as np
----> 9 from .. import backend as K
10
11

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/keras/backend/init.py in
----> 1 from .load_backend import epsilon
2 from .load_backend import set_epsilon
3 from .load_backend import floatx
4 from .load_backend import set_floatx
5 from .load_backend import cast_to_floatx

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/keras/backend/load_backend.py in
88 elif _BACKEND == 'tensorflow':
89 sys.stderr.write('Using TensorFlow backend.\n')
---> 90 from .tensorflow_backend import *
91 else:
92 # Try and load external backend.

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py in
54 get_graph = tf_keras_backend.get_graph
55 # learning_phase_scope = tf_keras_backend.learning_phase_scope # TODO
---> 56 name_scope = tf.name_scope
57
58

AttributeError: module 'tensorflow' has no attribute 'name_scope'

lab Guide Updates

Before hands on lab, task 4, Step 14 While running the linux commands, tar zxvf ../vsts-agent-linux-x64-2.191.1.tar.gz command will throw an error as the downloaded agent version will differ from the given one. Instruction has be added regarding it.
Before hands on lab, task 4, Step 5 Users will not be able to add the agent pool without creating project.
Instructions has to be added for creating the project in before hands on lab.
Exercise 1, task 4, step 3 The UI for creating the Service connection is updated as shown below. Screenshot has to be updated.

Exercise 3, task 4, step 2 While filling the details in Azure CLI, Azure Resource manger connection option is there after display name, but in options it is mentioned as Azure Subscription.

Instruction has to be updated.

Exercise 3 After adding Azure CLI and Azure ML model deploy to the agent job, there will be 1 job and 2 tasks as shown below. But in Exercise 3 screenshots there is mentioned as 1 job and 1 task.

screenshots has to be updated as shown below.

Exercise 7, task 2 , step 6 for checking logs in application insights, when i hover over request table, I cannot see the eyeball icon as shown in lab guide screenshot.

Screenshot has to be updated as below.

Exercise 7, task 3, step 3 For viewing the model data users are requested to click on Storage Explorer. But the UI for storage account is updated and the option is changed to Storage browser(Preview) as shown in below screenshot.

Screenshot and instruction has to be updated

November 2019 - Content update

This workshop is scheduled for a content update. Solliance, please review the workshop and provide your update suggestions for SME review here.

Before the HOL - MLOps - documentation issues

Please fix the following documentation issues in Before the HOL - MLOps:

Issue 1:
In Task 5: Setup Azure DevOps Agent and Step #10, the instructions say select "Image: Ubuntu Server 18.04 LTS - Gen 1". But Gen 1 doesnt show up by default. User has to browse all images to select Gen 1. This might confuse the user. Does Gen 2 work ok?

Issue 2:
In Task 5: Setup Azure DevOps Agent and Step #14, az extension add --name azure-cli-ml - curl -O [Download the agent URL copied above] has to be two different commands. This will confuse the user.

Pipeline failing in Exercise 4

In Exercise 4 task 2, When we run the build pipeline it is getting failed

Deploy & Test Webservice inline script

inline script should be

python $(System.DefaultWorkingDirectory)/_mlops-quickstart/devops-for-ai/aml_service/deploy.py --service_name $(service_name) --aks_name $(aks_name) --aks_region $(aks_region) --description $(description)

not

python aml_service/deploy.py --service_name $(service_name) --aks_name $(aks_name) --aks_region $(aks_region) --description $(description)

and working Directory should be
$(System.DefaultWorkingDirectory)/_mlops-quickstart/devops-for-ai

otherwise, it won`t deploy successed

Lab failing during Release: failing at the step of creating webservices.

To the best of my knowledge I did follow the instructions and having failure towards the end of lab where it really matters. Build succeeded. During Release, failing at the step of creating webservices. devops does not provide anything much to troubleshoot this.

When I load the same image that was built using devops on ACI, I see this message.

“ModuleNotFoundError: No module named 'ruamel'
Worker exiting (pid: 41)”

May I request you please help me troubleshoot this and get the lab working.
I did not want to spin cycles troubleshooting the build process since it’s not my specialty (I specialize on Azure App/Infra and Devops). I would love to get this lab working as I see great value coming out of this for my customer team that mainly consist of data scientists who are struggling to piece together container and devops aspect = MLOPS. Even if the webservice is failing its still valuable because I get the process / tasks sorted out. If this is not the right forum, I apologize and would appreciate if you could direct me to the right support group.

Issue in Exercise 2, task 2

In Exercise 2, Task 2, the pipeline run is failing with the issue shown below. I have provided the details in yml file according to the lab guide.
The cluster got created successfully in aml workspace but the the run is getting failed as shown below.

Build Pipeline Error - Exercise 4

Hi everyone,

I'd like to bring an error to your attention in this workshop during Exercise 4. In this step you execute a build pipeline from the repo linked in the instructions. The code here does not have any issues from what I can see. The build pipeline is here for reference and fails at this line.

This is the error message:

Traceback (most recent call last): 
File "aml_service/pipelines_master.py", line 125, in <module> 
pipeline_run.wait_for_completion(show_output=True) 
File "/home/vsts/.azureml/envs/azureml_162b9bd49e6546a5993481df5aa0d8e4/lib/python3.6/site-packages/azureml/pipeline/core/run.py", line 291, in wait_for_completion 
raise_on_error=raise_on_error) 
TypeError: wait_for_completion() got an unexpected keyword argument 'timeout_seconds'

Any ideas on a fix?

Having an issues during task 4: Mastering pipeline

Tried so many times and the job stopped facing: "'Run' object has no attribute 'get_output_data'",

Message: Activity Failed:

{
"error": {
"code": "UserError",
"message": "User program failed with AttributeError: 'Run' object has no attribute 'get_output_data'",
"detailsUri": "https://aka.ms/azureml-known-errors",
"details": [],
"debugInfo": {
"type": "AttributeError",
"message": "'Run' object has no attribute 'get_output_data'",
"stackTrace": " File "azureml-setup/context_manager_injector.py", line 148, in execute_with_context\n runpy.run_path(sys.argv[0], globals(), run_name="main")\n File "/home/vsts/.azureml/envs/azureml_162b9bd49e6546a5993481df5aa0d8e4/lib/python3.6/runpy.py", line 263, in run_path\n pkg_name=pkg_name, script_name=fname)\n File "/home/vsts/.azureml/envs/azureml_162b9bd49e6546a5993481df5aa0d8e4/lib/python3.6/runpy.py", line 96, in _run_module_code\n mod_name, mod_spec, pkg_name, script_name)\n File "/home/vsts/.azureml/envs/azureml_162b9bd49e6546a5993481df5aa0d8e4/lib/python3.6/runpy.py", line 85, in _run_code\n exec(code, run_globals)\n File "aml_service/pipelines_master.py", line 130, in \n data = pipeline_run.find_step_run('evaluate')[0].get_output_data('evaluate_output')\n"
},
"messageParameters": {}
},

{
"error": {
"message": "Activity Failed:\n{\n "error": {\n "code": "UserError",\n "message": "User program failed with AttributeError: 'Run' object has no attribute 'get_output_data'",\n "detailsUri": "https://aka.ms/azureml-known-errors\",\n "details": [],\n "debugInfo": {\n "type": "AttributeError",\n "message": "'Run' object has no attribute 'get_output_data'",\n "stackTrace": " File \"azureml-setup/context_manager_injector.py\", line 148, in execute_with_context\n runpy.run_path(sys.argv[0], globals(), run_name=\"main\")\n File \"/home/vsts/.azureml/envs/azureml_162b9bd49e6546a5993481df5aa0d8e4/lib/python3.6/runpy.py\", line 263, in run_path\n pkg_name=pkg_name, script_name=fname)\n File \"/home/vsts/.azureml/envs/azureml_162b9bd49e6546a5993481df5aa0d8e4/lib/python3.6/runpy.py\", line 96, in _run_module_code\n mod_name, mod_spec, pkg_name, script_name)\n File \"/home/vsts/.azureml/envs/azureml_162b9bd49e6546a5993481df5aa0d8e4/lib/python3.6/runpy.py\", line 85, in _run_code\n exec(code, run_globals)\n File \"aml_service/pipelines_master.py\", line 130, in \n data = pipeline_run.find_step_run('evaluate')[0].get_output_data('evaluate_output')\n"\n },\n "messageParameters": {}\n },\n "time": "0001-01-01T00:00:00.000Z"\n}"
}
}
##[error]Script failed with error: Error: The process '/bin/bash' failed with exit code 1
/opt/hostedtoolcache/Python/3.6.11/x64/bin/az account clear
error.txt

Issue in Before the hands-on lab, excerise- 1, task- 2

In step- 1, I'm getting an error while running the notebook where we install libraries for Azure ML.

Code cell:

Error:

modeldata is empty in Storage account even after multiple webservice endpoint calls.

Exercise 9: Examining deployed model performance
Task 3: Check the data collected

I followed the steps to setup Application Insights and went to Storage Account in the resource group where all the lab resources are placed. Even after multiple successful calls to deployed Model endpoint, there are no results saved in the storage account. modeldata container is empty.

Before HOL - Task 5: Setup Azure DevOps Agent Step:14

In Before HOL - Task 5: Setup Azure DevOps Agent Step:14, while running a below command -

az extension add --name azure-cli-ml

Getting an error the below error -

No matching extensions for 'azure-cli-ml'. Use --debug for more information.

Tried by running with available version for azure-cli-ml i.e, 2.0.3

Note: az and azure-cli are installed with latest version.

Please find the below screenshot for your reference:

I'm able to add azure-cli-ml by running the below command:
az extension add -s https://azurecliext.blob.core.windows.net/release/azure_cli_ml-1.22.0.1-py3-none-any.whl -y
In Exercise-2 Task-2, Pipeline Run getting failed due to "azure-cli-ml". As shown in the below screenshot:

Application Insights telemetry data is not available

In step Ex-8.2.7, there is no telemetry data present. I tried by running the whole lab 2 times.

Facing Error while building a Pipeline in Azure DevOps.

In Exercise 4 - Task 2, While building a pipeline in Azure DevOps facing an error-
There was a resource authorization issue: "The pipeline is not valid. Could not find a pool with name Hosted Agent. The pool does not exist or has not been authorized for use. For authorization details, refer to https://aka.ms/yamlauthz."
Even after authorizing resources and Agent Pool is available with online status, but pipeline build is getting failed.
Kindly refer the below attached screenshot :

Hands-on-lab instructions: Task 4, Step 2 issue

In Task 4: Add Install the AML CLI task to Test Deployment stage and Step 2. Provide the following information:
Azure Resource manager connection: quick-starts-sc does not show up automatically. We need to first authorize the ML workspace. This will confuse the users so correct it.

Azure Devops issue : not able to import repository

In exercise 3 task 2 https://github.com/microsoft/MCW-ML-Ops/blob/master/Hands-on%20lab/HOL%20step-by%20step%20-%20MLOps.md#task-2-import-quickstart-code-from-a-github-repo
while importing repository we are facing an issue

Workaround for the problem is :

Go to Account settings and Select “Preview Features“
Turn off “New Repos landing pages” preview feature.

In exercise 3 Task 4 step 3 https://github.com/microsoft/MCW-ML-Ops/blob/master/Hands-on%20lab/HOL%20step-by%20step%20-%20MLOps.md#task-4-create-new-service-connection
we are not getting option to select Service principal (automatic) feature, so we had to skip that step and we were able to test lab successfully

az ml folder option is missing

Exercise 4 -> Task 2: Run the Build Pipeline

Pipeline failed with the below error:

ERROR: 'folder' is misspelled or not recognized by the system

The folder option is missing :

(base) xx@RobiDevOpsAgent:~$ az ml -h

Group
    az ml : Manage Azure Machine Learning resources.
        This command group is experimental and under development. Reference and support levels:
        https://aka.ms/CLI_refstatus
Subgroups:
    code        : Manage Azure ML code assets.
    compute     : Manage Azure ML compute resources.
    data        : Manage Azure ML data assets.
    datastore   : Manage Azure ML datastores.
    endpoint    : Manage Azure ML endpoints.
    environment : Manage Azure ML environments.
    job         : Manage Azure ML jobs.
    model       : Manage Azure ML models.
    workspace   : Manage Azure ML workspaces.

versions:

(base) xx@RobiDevOpsAgent:~$ python --version
Python 3.9.5
(base) robi@RobiDevOpsAgent:~$ uname -a
Linux xxDevOpsAgent 5.4.0-1056-azure #58~18.04.1-Ubuntu SMP Wed Jul 28 23:14:18 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
(base) robi@RobiDevOpsAgent:~$ pip -V
pip 21.2.4 from /home/xx/miniconda3/lib/python3.9/site-packages/pip (python 3.9)
(base) robi@RobiDevOpsAgent:~$ az --version
azure-cli                         2.27.2

core                              2.27.2
telemetry                          1.0.6

Extensions:
ml                               2.0.1a5
aks-preview                       0.5.29

Python location '/home/xx/miniconda3/bin/python'
Extensions directory '/home/xx/.azure/cliextensions'

Python (Linux) 3.9.5 (default, Jun  4 2021, 12:28:51)
[GCC 7.5.0]

Legal docs and information: aka.ms/AzureCliLegal


Your CLI is up-to-date.

Not able to select service connection quick-starts-sc

In exercise 5 task 7 we have to select quick-start-sc as a service connection , but i am not able to select the service connection that i created in previous exercise (quick-starts-sc)

Deep Learning with Text.ipynb doesn't work.

where line 176, converted_model = onnxmltools.convert_keras(model, onnx_model_name, target_opset=7)

with error('tuple' object has no attribute 'layer')

i think because of keras or tensorflow version update.

Pipeline queued in exercice 4

Hello, I am trying to run the pipeline following the example of exercise 4 but it is in queue and does not execute. I checked and I don't see other services in parallel

Image: Ubuntu-18.04
Queued: Just now [manage parallel jobs]

This agent request is not running because you have reached the maximum number of requests that can run for parallelism type 'Microsoft-Hosted Private'. Current position in queue: 1
Job preparation parameters
ContinueOnError: False
TimeoutInMinutes: 60
CancelTimeoutInMinutes: 5
Expand:
  MaxConcurrency: 0
  ########## System Pipeline Decorator(s) ##########

  Begin evaluating template 'system-pre-steps.yml'
Evaluating: eq('true', variables['system.debugContext'])
Expanded: eq('true', Null)
Result: False
Evaluating: resources['repositories']['self']
Expanded: Object
Result: True
Evaluating: not(containsValue(job['steps']['*']['task']['id'], '6d15af64-176c-496d-b583-fd2ae21d4df4'))
Expanded: not(containsValue(Object, '6d15af64-176c-496d-b583-fd2ae21d4df4'))
Result: True
Evaluating: resources['repositories']['self']['checkoutOptions']
Result: Object
Finished evaluating template 'system-pre-steps.yml'
********************************************************************************
Template and static variable resolution complete. Final runtime YAML document:
steps:
- task: 6d15af64-176c-496d-b583-fd2ae21d4df4@1
  inputs:
    repository: self


  MaxConcurrency: 0```

Azure ML compute name should be unique in whole Azure region

Task 1 -> Step 10 in HOL has this:

a. Compute name: mlops-compute. When you create a VM, provide a name. The name must be between 2 to 16 characters. Valid characters are letters, digits, and the - character, and must also be unique across your Azure subscription.

However when I create the AML compute it fails with below error:

"Compute with this name already exists. Compute name needs to be unique across all existing computes within an Azure region."

This means that AML compute name has to be unique across entire Azure Region instead of just subscription.
You can ask users to suffix a unique identifier to their AML compute name.

Issues in Labguide

Please find the few upadtion suggestions below:

Required DevOps agent to complete the lab (paid), whereas in requirement it's asked to use a free Azure DevOps subscription.
Exercise 1, Task 1, step 10 > Screenshot requires updation (AML UI changed)
Exercise 1, Task 1, step 11 > Instructions to Authenticate is missing, and also not able to find the edit option as per the instruction/ screenshot
Exercise 1, Task 2 , Step 1> While running the notebooks came across instructions to provide subscription_id, resource_group , workspace_name and workspace_region in the cells, but not able to identify which cell to be update with these values.

September 2020 Updates

Issue in build pipeline

I am facing error while running the build pipeline. When I am running the .yml file, Command az ml folder attach, is giving error. The error is as below :

ERROR: init() got an unexpected keyword argument 'async_persist'

Hands-on-lab instructions: Task 7, Step 6 issue

In step 6 the query is missing the pipe "|" it should be:

requests
| where timestamp > ago(24h)
| limit 10

AzureML Compute Job failed

Hi,
Implementing the repository in my Devops environment and it fails on AzureML compute Job with error: Failed to parse Job Spec due to invalid base64 encoding with error Encoded text cannot have a 6-bit remainder.",

Here is the log file, any ideas to fix this issue
10.txt
?

Automate your cycle of Intelligence

Katonic MLOps Platform is a collaborative platform with a Unified UI to manage all data science activities in one place and introduce MLOps practice into the production systems of customers and developers. It is a collection of cloud-native tools for all of these stages of MLOps:

-Data exploration
-Feature preparation
-Model training/tuning
-Model serving, testing and versioning

Katonic is for both data scientists and data engineers looking to build production-grade machine learning implementations and can be run either locally in your development environment or on a production cluster. Katonic provides a unified system—leveraging Kubernetes for containerization and scalability for the portability and repeatability of its pipelines.

It will be great if you list it on your account

Website -
Katonic One Pager.pdf

https://katonic.ai/

March 2020 – content update

Suggestion content goes here

"Can't import tf2onnx module ..." during keras conversion

Hi,

Found some warning error like below during keras to onnx conversion:

Saving model files...
model saved in ./outputs/model folder
Saving model files completed.
Can't import tf2onnx module, so the conversion on a model with any custom/lambda layer will fail!
WARNING - Can't import tf2onnx module, so the conversion on a model with any custom/lambda layer will fail!
Model exported in ONNX format...

Adding pip install of tf2onnx should fix the issue, for example:

!pip install tf2onnx==1.6.3

Thanks!

#31

error in exercise 6 task 3

we are getting failure while running pipeline in Exercise 6 task 3

ReleaseLogs_1.zip

Not able to select Service connection

In exercise 3 task 4 step 3 i am not able to select machine learning workspace even though workspace is created

Feedback on workshop content

Hi, I have some feedback related to the workshop content. I couldn't find any other channel to give feedback, so I decided to create an issue.

I think the workshop is missing the central piece of Azure MLOps, that is the Azure DevOps ML extension. The ML extension allows to trigger pipelines after a model has been updated in the Azure ML Workspace, but this feature has not been used in this workshop. Azure ML Workspace is only used to hold the model, but only for recording purposes. This created some confusion among workshop participants.
The ONNX model is discussed in the Whiteboard session and it is promised that we will "observe how this simplifies inference runtime code, enabling pluggability of different models and targeting a broad range of runtime environments and most importantly, improves inferencing speed over the native model". However, the Notebooks do not refer to ONNX or use the ONNX runtime. How is the use of ONNX demonstrated during the workshop? Does AML Workspace store the models in ONNX format, out of the box?

Otherwise I like the workshop content, thanks for good work! :)

Sept 2021 Update Suggestions

Deep Learning with Text.ipynb

while running the experiment in Deep Learning with text.ipynb i am getting error as:

Test data prediction: name 'inputs_dc' is not defined

In Exercise 6, Task 4, the script is not able to find test data:

Similarly, in Exercise 7, the Test Deployment notebook gives a similar message:

Any idea how to fix this? It's not an error per se, since the release pipeline completes successfully and the notebook does not throw an exception here.

Deploy and Test Webservice steps fails with different reasons

Exercise 6: Test Build and Release Pipelines
Task 4: Review Release Pipeline Outputs

#1 Deploy and Test Webservice step fails with error "ServicePrincipalNotFound". See attached log tasklog_8.log
tasklog_8.log

#2 Deleted failed AKS cluster (may be consider checking AKS provisioning state and add step to delete bad/failed clusters)

#3 Redeployed Release pipeline. This time AKS Cluster created successfully but it fails while creating AML Endpoint with error "CrashLoopBackOff" container is crashed. See attached tasklog_8 (1).log
tasklog_8 (1).log

Pipeline failure in Exercise4

In Exercise4 Task2 (Run the Build Pipeline) step3, the pipeline failed.
Please find the screenshots and log file of the pipeline below for reference.

Log file:
1_Job (1).txt

Can you please check on this.

Thanks,
Tejaswini

modeldata container is empty

Exercise 7 (Optional): Examining deployed model performance
I enabled both application insights and model data collection as directed in Model Telemetry notebook

Then I am on Task 3: Check the data collected

I go to storage account and see that modeldata container is empty. I tried refreshing several times but data won't show in modeldata container.

I am able to see the data in application insights though.