microsoft / azdo-databricks Goto Github PK

A set of Build and Release tasks for Building, Deploying and Testing Databricks notebooks

License: MIT License

TypeScript 48.83% JavaScript 30.64% Shell 20.53%

azdo-databricks's Introduction

DevOps for Databricks extension

This extension brings a set of tasks for you to operationalize build, test and deployment of Databricks Jobs and Notebooks.

Pre-requisites

Use Python Version

To run this set of tasks in your build/release pipeline, you first need to explicitly set a Python version. To do so, use this task as a first task for your pipeline.

Supported Hosted Agents

With the new tasks added for supporting Scala Development, the agent support is now defined by task. See each task documentation to check its compatibility with the available Hosted Agents.

It's strongly recommend that you use Hosted Ubuntu 1604 for your pipelines.

Pipeline Tasks

Configure Databricks CLI

This pipeline task installs and configures the Databricks CLI onto the agent. The following steps are performed:

Installs databricks-cli using pip (that's why using Use Python Version is required);
Writes a configuration file at ~/.databrickscfg so the CLI will know which Databricks Workspace to connect to.
- Instead of creating a DEFAULT profile, it creates a profile called AZDO

Supported Agents

Hosted Ubuntu 1604
Hosted VS2017

Important: What is done with your Databricks PAT?

Your Databricks Personal Access Token (PAT) is used to grant access to your Databricks Workspace from the Azure DevOps agent which is running your pipeline, either being it Private or Hosted.

Given that the Microsoft Hosted Agents are discarded after one use, your PAT - which was used to create the ~/.databrickscfg - will also be discarded. This means that your PAT will not be used for anything else other than running your own pipeline.

Store your PAT as a variable

It is strongly recommended that you do not pass your Personal Access Token as a plain text to the task. Instead, store it as a Secret Variable and use the variable reference on the task.

Supported Agents

Hosted Ubuntu 1604
Hosted VS2017

Deploy Notebooks to Workspace

This Pipeline task recursively deploys Notebooks from given folder to a Databricks Workspace.

Parameters

Notebooks folder: a folder that contains the notebooks to be deployed. For example:
- $(System.DefaultWorkingDirectory)/<artifact name>/notebooks
Workspace folder: the folder to publish the notebooks on the target Workspace. For example:
- /Shared
- /Shared/Automation
- /Users/[email protected]

Supported Agents

Hosted Ubuntu 1604
Hosted VS2017

Execute $(notebookPath)

Executes a notebook given its workspace path. Parameters are:

Notebook path (at workspace): The path to an existing Notebook in a Workspace.
Existing Cluster ID: if provided, will use the associated Cluster to run the given Notebook, instead of creating a new Cluster.
Notebook parameters: if provided, will use the values to override any default parameter values for the notebook. Must be specified in JSON format.

Supported Agents

Hosted Ubuntu 1604
Hosted VS2017

Wait for Notebook execution

Makes the Pipeline wait until the Notebook run - invoked by the previous task - finishes.

If the Notebook execution succeeds (status SUCCESS), this task will also succeed.

If the Notebook execution fails (status FAILED), the task (and the Pipeline) will fail.

You can have access to the run URL through the task logs. For example:

2019-06-18T21:22:56.9840342Z The notebook execution suceeded (status SUCCESS)
2019-06-18T21:22:56.9840477Z For details, go to the execution page: https://<region>.azuredatabricks.net/?o=<organization-id>#job/<run-id>/run/1

Start a Databrick Cluster (new!)

This task will start a given Databrick Cluster. It will do nothing if the cluster is already started.

Parameters

Cluster ID: The ID of the cluster. Can be founds on its URL or on its Tags.

Supported Agents

Hosted Ubuntu 1604
Hosted VS2017

Install Scala Tools (new!)

Installs the following tools on the Agent:

Java SDK
Scala
SBT

Supported Agents

Hosted Ubuntu 1604

Install Spark

Installs Spark libraries on the agent.

Supported Agents

Hosted Ubuntu 1604

Compiles and Installs JAR using SBT (new!)

This task will:

Compile a given project using SBT
Copy the following to your Databricks Cluster:
- Copy the resulting JAR to the Databricks Cluster
- Copy a sample data set to the Databricks Cluster
- Copy a sample dataset file to the Databricks Cluster

Parameters

Cluster ID: The ID of the cluster you want to install this library.
Working Directory: The project directory. Where the build.sbt lives.
JAR package name (overrides build.sbt): The name you want to give to your JAR package name. It will override the value if set on build.sbt.
Package Version (overrides build.sbt): A version you want to give to your JAR package. It will override the value if set on build.sbt.
Scala Version (overrides build.sbt): The Scala version you want to specify for this compilation. It will override the value if set on build.sbt.
Sample dataset path: The path to a dataset file for testing purposes.

Supported Agents

Hosted Ubuntu 1604

Known issues

Fortunately, no known issues so far. Please feel free to open a new issue on GitHub if you experience any problem.

Release Notes

Please check the Release Notes page on GitHub.

Contributing

To know more about how to contribute to this project, please see CONTRIBUTING page.

azdo-databricks's People

Contributors

Stargazers

Watchers

azdo-databricks's Issues

Waiting Task : Allow the possibility to configure delay in the while loop

Sometimes Waiting task fail with "temporarily unavailable error" :

Error: b'{"error_code":"TEMPORARILY_UNAVAILABLE","message":"The service at /api/2.0/jobs/runs/get is temporarily unavailable. Please try again later."}'

I think API call are to close and cause this error. In my case there is one second between each call.

Giving the possibility to adjust a delay will be a good thing

Passing Tokens from variable group import notebook fails

Observed a weird case, where when passing the Access Token Directly in the job, the pipeline is successful

But, when i try to pass Access Token value from Variable Group (Secret), the "Configure Databricks CLI" Job is successful, but the "Deploy Notebook to workspace" job get failed with Error: "The Notebooks import process failed"

Configure Databricks task does not fail even if "pip install databricks-cli" fails

Right now "pip install databricks-cli" is failing because databricks-cli depends on tabulate>=0.7.7 and there is an open issue with tabulate 0.8.4 install failing on windows (link). But despite the failure to install databricks-cli, the "Configure Databricks" task still succeeds.

2019-09-25T01:58:16.2362991Z ##[section]Starting: Configure Databricks CLI 2019-09-25T01:58:16.2493382Z ============================================================================== 2019-09-25T01:58:16.2493505Z Task : Configure Databricks 2019-09-25T01:58:16.2493592Z Description : Configure Databricks CLI 2019-09-25T01:58:16.2493659Z Version : 0.5.2 2019-09-25T01:58:16.2493736Z Author : Microsoft DevLabs 2019-09-25T01:58:16.2493983Z Help : 2019-09-25T01:58:16.2494079Z ============================================================================== 2019-09-25T01:58:16.5084259Z [command]C:\hostedtoolcache\windows\Python\3.7.4\x64\python.exe -V 2019-09-25T01:58:16.5261893Z Python 3.7.4 2019-09-25T01:58:16.5279102Z Version: 3.7.4 2019-09-25T01:58:16.5279763Z 2019-09-25T01:58:16.5280285Z Python3 selected. Running... 2019-09-25T01:58:16.5323535Z [command]C:\Users\VssAdministrator\AppData\Roaming\Python\Python37\Scripts\pip.exe install databricks-cli 2019-09-25T01:58:20.0684365Z Collecting databricks-cli 2019-09-25T01:58:20.0685136Z Downloading https://files.pythonhosted.org/packages/51/0b/75dac581d98c493be74df97f3ea515c678da2e4be8cafbaf9cba9f01c309/databricks-cli-0.9.0.tar.gz (45kB) 2019-09-25T01:58:20.0685674Z Collecting click>=6.7 (from databricks-cli) 2019-09-25T01:58:20.0686018Z Downloading https://files.pythonhosted.org/packages/fa/37/45185cb5abbc30d7257104c434fe0b07e5a195a6847506c074527aa599ec/Click-7.0-py2.py3-none-any.whl (81kB) 2019-09-25T01:58:20.0686283Z Collecting requests>=2.17.3 (from databricks-cli) 2019-09-25T01:58:20.0686507Z Downloading https://files.pythonhosted.org/packages/51/bd/23c926cd341ea6b7dd0b2a00aba99ae0f828be89d72b2190f27c11d4b7fb/requests-2.22.0-py2.py3-none-any.whl (57kB) 2019-09-25T01:58:20.0686750Z Collecting tabulate>=0.7.7 (from databricks-cli) 2019-09-25T01:58:20.0686992Z Downloading https://files.pythonhosted.org/packages/76/35/ae65ed1268d6e2a1be141723e5fffdf4a28e4f4e7c1e083709b308998f90/tabulate-0.8.4.tar.gz (45kB) 2019-09-25T01:58:20.0688048Z ERROR: Command errored out with exit status 1: 2019-09-25T01:58:20.0688475Z command: 'c:\hostedtoolcache\windows\python\3.7.4\x64\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\VSSADM~1\\AppData\\Local\\Temp\\pip-install-bih6hjoh\\tabulate\\setup.py'"'"'; __file__='"'"'C:\\Users\\VSSADM~1\\AppData\\Local\\Temp\\pip-install-bih6hjoh\\tabulate\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base pip-egg-info 2019-09-25T01:58:20.0688826Z cwd: C:\Users\VSSADM~1\AppData\Local\Temp\pip-install-bih6hjoh\tabulate\ 2019-09-25T01:58:20.0689034Z Complete output (7 lines): 2019-09-25T01:58:20.0689234Z Traceback (most recent call last): 2019-09-25T01:58:20.0689416Z File "<string>", line 1, in <module> 2019-09-25T01:58:20.0689642Z File "C:\Users\VSSADM~1\AppData\Local\Temp\pip-install-bih6hjoh\tabulate\setup.py", line 25, in <module> 2019-09-25T01:58:20.0689849Z f.write(LONG_DESCRIPTION) 2019-09-25T01:58:20.0690071Z File "c:\hostedtoolcache\windows\python\3.7.4\x64\lib\encodings\cp1252.py", line 19, in encode 2019-09-25T01:58:20.0690275Z return codecs.charmap_encode(input,self.errors,encoding_table)[0] 2019-09-25T01:58:20.0690507Z UnicodeEncodeError: 'charmap' codec can't encode characters in position 5907-5924: character maps to <undefined> 2019-09-25T01:58:20.0690761Z ---------------------------------------- 2019-09-25T01:58:20.0690964Z ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output. 2019-09-25T01:58:20.0691190Z WARNING: You are using pip version 19.2.2, however version 19.2.3 is available. 2019-09-25T01:58:20.0691394Z You should consider upgrading via the 'python -m pip install --upgrade pip' command. 2019-09-25T01:58:20.0691681Z Error while installing databricks-cli: ERROR: Command errored out with exit status 1: 2019-09-25T01:58:20.0694017Z command: 'c:\hostedtoolcache\windows\python\3.7.4\x64\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\VSSADM~1\\AppData\\Local\\Temp\\pip-install-bih6hjoh\\tabulate\\setup.py'"'"'; __file__='"'"'C:\\Users\\VSSADM~1\\AppData\\Local\\Temp\\pip-install-bih6hjoh\\tabulate\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base pip-egg-info 2019-09-25T01:58:20.0694925Z cwd: C:\Users\VSSADM~1\AppData\Local\Temp\pip-install-bih6hjoh\tabulate\ 2019-09-25T01:58:20.0695130Z Complete output (7 lines): 2019-09-25T01:58:20.0695333Z Traceback (most recent call last): 2019-09-25T01:58:20.0696031Z File "<string>", line 1, in <module> 2019-09-25T01:58:20.0696248Z File "C:\Users\VSSADM~1\AppData\Local\Temp\pip-install-bih6hjoh\tabulate\setup.py", line 25, in <module> 2019-09-25T01:58:20.0696484Z f.write(LONG_DESCRIPTION) 2019-09-25T01:58:20.0696686Z File "c:\hostedtoolcache\windows\python\3.7.4\x64\lib\encodings\cp1252.py", line 19, in encode 2019-09-25T01:58:20.0696911Z return codecs.charmap_encode(input,self.errors,encoding_table)[0] 2019-09-25T01:58:20.0697122Z UnicodeEncodeError: 'charmap' codec can't encode characters in position 5907-5924: character maps to <undefined> 2019-09-25T01:58:20.0697350Z ---------------------------------------- 2019-09-25T01:58:20.0697904Z ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output. 2019-09-25T01:58:20.0698145Z WARNING: You are using pip version 19.2.2, however version 19.2.3 is available. 2019-09-25T01:58:20.0698584Z You should consider upgrading via the 'python -m pip install --upgrade pip' command. 2019-09-25T01:58:20.0698773Z 2019-09-25T01:58:20.0713289Z Writing databricks-cli configuration to file: C:\Users\VssAdministrator\.databrickscfg 2019-09-25T01:58:20.0786544Z ##[section]Finishing: Configure Databricks CLI

Vulnerability Issue Found

Vulnerability issue found in the code, kindly suggest who is the responsible person to contact and report the issue.

Databricks Bearer Token task failed ##[error]Response Code: 503

2020-12-01T10:45:44.6896058Z ##[section]Starting: Databricks Bearer Token
2020-12-01T10:45:44.7078176Z ==============================================================================
2020-12-01T10:45:44.7078584Z Task : Databricks Create Bearer Token
2020-12-01T10:45:44.7079064Z Description : Databricks Create Bearer Token and outputs back to a variable called $(BearerToken) which you can reuse in later steps
2020-12-01T10:45:44.7079456Z Version : 0.9.2916
2020-12-01T10:45:44.7079692Z Author : Data Thirst Ltd
2020-12-01T10:45:44.7081049Z Help : Creates a variable called $(BearerToken) in your pipeline containing a new Databricks Token
2020-12-01T10:45:44.7081519Z ==============================================================================
2020-12-01T10:46:04.8509908Z Tools Version: 2.1.2915
2020-12-01T10:46:06.9136334Z ##[error]Response Code: 503
2020-12-01T10:46:07.2438285Z ##[section]Finishing: Databricks Bearer Token

Install Scala Tools Task Fails

I see that the apt-get update and apt-get install sbt -y are failing

Here is the snippet from the log:

2021-07-09T12:39:16.9171074Z sudo DEBIAN_FRONTEND=noninteractive apt-get update
2021-07-09T12:39:16.9171437Z ===================
2021-07-09T12:39:17.0271638Z Hit:1 http://azure.archive.ubuntu.com/ubuntu focal InRelease
2021-07-09T12:39:17.0273113Z Hit:2 http://azure.archive.ubuntu.com/ubuntu focal-updates InRelease
2021-07-09T12:39:17.0274045Z Hit:3 http://azure.archive.ubuntu.com/ubuntu focal-backports InRelease
2021-07-09T12:39:17.0329542Z Hit:4 http://ppa.launchpad.net/ubuntu-toolchain-r/test/ubuntu focal InRelease
2021-07-09T12:39:17.0713699Z Err:5 https://dl.bintray.com/sbt/debian InRelease
2021-07-09T12:39:17.0714621Z 403 Forbidden [IP: 35.157.24.53 443]
2021-07-09T12:39:17.0787262Z Hit:6 https://packages.microsoft.com/ubuntu/20.04/prod focal InRelease
2021-07-09T12:39:17.1729795Z Hit:7 http://security.ubuntu.com/ubuntu focal-security InRelease
2021-07-09T12:39:18.7422803Z Reading package lists...
2021-07-09T12:39:18.7615348Z E: Failed to fetch https://dl.bintray.com/sbt/debian/InRelease 403 Forbidden [IP: 35.157.24.53 443]
2021-07-09T12:39:18.7617016Z E: The repository 'https://dl.bintray.com/sbt/debian InRelease' is not signed.
2021-07-09T12:39:18.7634938Z ===================
2021-07-09T12:39:18.7636092Z sudo DEBIAN_FRONTEND=noninteractive apt-get install sbt -y
2021-07-09T12:39:18.7636959Z ===================
2021-07-09T12:39:18.8304314Z Reading package lists...
2021-07-09T12:39:18.9896248Z Building dependency tree...
2021-07-09T12:39:18.9912572Z Reading state information...
2021-07-09T12:39:19.1205822Z E: Unable to locate package sbt
2021-07-09T12:39:19.1260499Z ##[debug]Exit code 100 received from tool '/usr/bin/bash'
2021-07-09T12:39:19.1271494Z ##[debug]STDIO streams have closed for tool '/usr/bin/bash'
2021-07-09T12:39:19.1325817Z ##[error]Bash exited with code 100
2021-07-09T12:39:19.1337278Z ##[debug]Processed: ##vso[task.issue type=error;]Bash exited with code 100
2021-07-09T12:39:19.1337986Z ##[debug]task result: Failed
2021-07-09T12:39:19.1341905Z ##[debug]Processed: ##vso[task.complete result=Failed;done=true;]
2021-07-09T12:39:19.1344268Z ##[section]Finishing: Install Scala Tools

The issue looks similar to the one mentioned at - sbt/sbt#4507

Ask: Please help in fixing the above issue.

InstallScalaTools.log

InvalidConfigurationError after Configure CLI

I run the following tasks:

Use Python 3.7
Configure Databricks CLI
- $(HOST)
- $(TOKEN)
Bash: databricks clusters list

The bash script gives the following error:

Error: InvalidConfigurationError: You haven't configured the CLI yet!

If I replace Configure Databricks CLI with the following bash:

Use Python 3.7
Bash

databricks configure --token <<EOF
$(HOST)
$(TOKEN)
EOF

Bash: databricks clusters list

Only in this case, the Databricks CLI is working as expected.
Why isn't the Databricks CLI finding the configuration when using the Configure Databricks CLI Task?

Service Connection for Azure Databricks

Is it possible to create and support an Azure Databricks Service connection for this extension, so it would help to

use a service principal credential to auth and do not rely on PAT tokens.
keep the credentials at the service connection level instead of the pipelines.
we do not need to create a new configure cli task for every pipeline.

Feature request. Implement usage of Jobs API (api/2.0/jobs)

Hi, i found that this extension is very good and handy.

But i would also would like to have the opportunity to create and execute jobs using that extension, without creating of notebooks, for instance like it described here: https://docs.databricks.com/dev-tools/api/latest/examples.html#create-a-spark-submit-job

Probably this functionality already implemented in cluster creation task and i would appreciate if you could give any advise how to create and execute jobs using that extension without using of notebooks.

Feature Request - Separate "Compile and installs JAR using SBT" into multiple tasks

We use a gateway between the CI build and the CD deployment into prod environments. It would be cleaner for us to build a JAR using SBT during the build process to ensure the build is successful and the tests pass, and then deploy the resulting JAR artefact into multiple Databricks clusters later. Also separating the deployment of data into environments could be helpful too.

Happy to contribute to the refactoring.

Cannot see these in the task list

I cannot see the DevOps for Azure Databricks in task list of Azure DevOps though I have installed this one day before yesterday. It is still in the market place list. Could you please me on this.

Bug: Errors in Start Cluster Task do not cause the task to fail

When errors are encountered in Start Cluster Task such as an incorrect cluster id the task will remain stuck in a loop until manually cancelled. If possible please update the documentation to mention the possibility of an infinite loop.

Configure Databricks CLI failing

In Azure Devops below is failing
Configure Databricks CLI is showing error

[command]C:\hostedtoolcache\windows\Python\3.7.9\x64\Scripts\pip.exe install databricks-cli
Error while installing databricks-cli
Writing databricks-cli configuration to file: C:\Users\VssAdministrator.databrickscfg

Since the Configure Databricks CLI is failing the subsequent task also fails

Executing other Databricks CLI commands

These tools are great, but I'd like to be able to use other CLI commands as well. One example is to be able to deploy new Jobs from source control.

At the moment, I can deploy the Notebooks but when I try to run other commands using CLI, it tells me I haven't configured databricks yet.

Questions:

Is there an existing module/DevOps task to execute generic databricks CLI commands?
Is there a way to execute them using e.g. standard CLI

Waiting Task : INTERNAL ERROR not catched

Waiting task does not catch all error.
In my case, I inadvertently launch a missing file. The databricks job fail with an internal error, but the task does not catch it, so it continue to wait.

Here is the JSON response :

{
  "job_id": 1,
  "run_id": 1,
  "number_in_job": 1,
  "original_attempt_run_id": 1,
  "state": {
    "life_cycle_state": "INTERNAL_ERROR",
    "state_message": "Notebook not found: /Shared/main.py"
  },
  "task": {
    "notebook_task": {
      "notebook_path": "/Shared/main.py"
    }
  },
  "cluster_spec": {
    "new_cluster": {
      "spark_version": "5.3.x-scala2.11",
      "node_type_id": "Standard_DS3_v2",
      "enable_elastic_disk": true,
      "num_workers": 2
    }
  },
  "start_time": 1570602949211,
  "setup_duration": 0,
  "execution_duration": 0,
  "cleanup_duration": 0,
  "trigger": "ONE_TIME",
  "creator_user_name": "john_doe@john_doe.fr",
  "run_name": "AzDO Execution",
  "run_page_url": "https://westeurope.azuredatabricks.net/?o=9999999999999#job/1/run/1",
  "run_type": "JOB_RUN"
}

not able to connect to Databricks using DevOps for Databricks

not able to connect to Databricks using DevOps for Databricks, please see attached.
DepOps-Data Briks.docx

{"error_code": "TEMPORARILY_UNAVAILABLE", "message": "Authentication is temporarily unavailable. Please try again later."}

Not able to acces databricks

Spark Runtime used is no longer supported

"spark_version": "5.3.x-scala2.11"

This version of spark is out dated and does not work with delta libs in version 6+
Also only runs python 2, please fix to make this configurable for 3 as well

Please include the SCIM APIs functions as additional tasks

I found this PowerShell module (https://github.com/gbrueckl/Databricks.API.PowerShell) which has implemented the SCIM API function which is helpful in managing the permission, creating groups, secrets scope and so on. Since Microsoft or DataBricks don't have any UI to manage or add these we have to do it through REST APIs.

If we can make additional task for mostly used API or of all of them. It would be great improvement to this DevOps extension.

Alternative to this extension (due to deprecation?)

Since this extension seems to be marked as deprecated (still open PR #43 ),

Is there any alternative to the tasks provided here?

Notebook deploy task succeeds even when failure due to permissions occurs

Even though a PERMISSION_DENIED occurred, the Deploy Notebooks task reported success.

It could be the same issue as #18.

2020-09-28T02:06:18.5998033Z ##[section]Starting: Deploy Notebooks to Workspace
2020-09-28T02:06:18.6117200Z ==============================================================================
2020-09-28T02:06:18.6117265Z Task         : Deploy Databricks Notebooks
2020-09-28T02:06:18.6117321Z Description  : Recursively deploys Notebooks from given folder to a Databricks Workspace
2020-09-28T02:06:18.6117358Z Version      : 0.5.2
2020-09-28T02:06:18.6117392Z Author       : Microsoft DevLabs
2020-09-28T02:06:18.6117425Z Help         : 
2020-09-28T02:06:18.6117478Z ==============================================================================
2020-09-28T02:06:18.9059294Z [command]C:\hostedtoolcache\windows\Python\3.7.4\x64\python.exe -V
2020-09-28T02:06:18.9221320Z Python 3.7.4
2020-09-28T02:06:18.9244157Z Version: 3.7.4
2020-09-28T02:06:18.9244417Z 
2020-09-28T02:06:18.9245965Z Python3 selected. Running...
2020-09-28T02:06:18.9324826Z [command]C:\hostedtoolcache\windows\Python\3.7.4\x64\Scripts\databricks.exe workspace import_dir -o --profile AZDO F:\AgentA\_work\r293\a\_dp-app-car\drop\notebook /deploy-ci/src/app/
2020-09-28T02:06:20.5225447Z {'error_code': 'PERMISSION_DENIED', 'message': '[email protected] does not have Manage permissions on /. Please contact the owner or an administrator for access.'}
2020-09-28T02:06:20.5225684Z {'error_code': 'PERMISSION_DENIED', 'message': '[email protected] does not have Manage permissions on /. Please contact the owner or an administrator for access.'}
2020-09-28T02:06:20.5225723Z 
2020-09-28T02:06:20.5306571Z ##[section]Finishing: Deploy Notebooks to Workspace

Notebooks failed to deploy but release succeeded

Notebooks failed to deploy with error in logs on agent:

{'error_code': 'TEMPORARILY_UNAVAILABLE', 'message': 'Authentication is temporarily unavailable. Please try again later.'}

This is in the "Deploy Notebooks to Workspace" step. The overall step succeeded but I would have expected it to fail.

Task "Deploy notebooks to workspace" skips non-notebook files

We recently enabled files in repos on our Azure Databricks development environment, which has worked well for us so far. Today, I noticed a problem when using the "Deploy notebooks to workspace" task. For one, it converts all python files to notebooks, regardless of whether they actually are notebooks. Moreover, it skips other non-notebook files, such as json files, that are needed for proper functioning on the test environment, that we are deploying to (and the production environment, when it passes test).

Is there a way to work around this issue?

Configure DataBricks Task Fails with no error log

As shown in the above image, the configure Databricks CLI task fails silently leading to cascading failures.

Quotes are incorrectly escaped in notebook parameters

A notebook has a widget called args that receives a json object as a value.
A correct execution requires something like this:

{"args":"{\"key1\":\"value1\",\"key2\":\"value2\"}"}

But when this text is put in the Execute Notebook task, Databricks receives this content:

{"args":"{\\key1\\:\\value1\\,\\key2\\:\\value2\\}"}

(Quotes are replaced by double backslashes and the json becomes invalid)

Is this project abandoned?

Noticed that the extension was unpublished within the past week or so from the marketplace. Since this project hasn't received much love one has to wonder if this project has been abandoned.

Deploy Notebooks to Workspace failed, due to Missing argument "TARGET_PATH"

I tried to Deploy Notebooks to Workspace but failed due to Missing argument "TARGET_PATH".

Please find the screenshot below:

I tried to use the default "/Shared" directory. Do I miss anything here?

Thank you!

[error]Unexpected token W in JSON at position 0

We are trying to execute DataBricks Notebook from CICD pipeline. Its working fine last Monday(18th October 2021). But Now its throwing the following error. Could you please, let us know the steps to fix this issue.

2021-10-19T17:07:44.1187852Z [command]C:\hostedtoolcache\windows\Python\3.10.0\x64\Scripts\databricks.exe jobs create --json-file D:\a_tasks\executenotebook_ac263826-c64e-4f2d-b7ce-5f7e777fd8bc\0.5.2\job-configuration.json --profile AZDO
2021-10-19T17:07:44.6513431Z WARN: Your CLI is configured to use Jobs API 2.0. In order to use the latest Jobs features please upgrade to 2.1: 'databricks jobs configure --version=2.1'. Future versions of this CLI will default to the new Jobs API. Learn more at https://docs.databricks.com/dev-tools/api/latest/jobs.html
2021-10-19T17:07:44.6514792Z {
2021-10-19T17:07:44.6515219Z "job_id": 16204911
2021-10-19T17:07:44.6515595Z }
2021-10-19T17:07:44.6568609Z ##[error]Unexpected token W in JSON at position 0
2021-10-19T17:07:44.6611871Z ##[section]Finishing: Execute /deployments/common/CICD_Exec/CICD

Thanks in Advance.

Regards,
Karthick Babu T G

Path issue in Databricks notebook execution in Relase pipeline

Hi,

I'm implementing the CICD for azure databricks notebooks, In one of my requirements, I want to execute one of the notebook which is there in the workspace after the notebook is deployed to specific folder.

find below the image of my error in detail.

Release pipeline:

Error:

ErrorLog:
ReleaseLogs_556.zip

Error: AttributeError: type object 'Retry' has no attribute 'DEFAULT_METHOD_WHITELIST'

We have following task which is failing with below error:
Tasks:
#===== Configure Databricks CLI =====
- task: configuredatabricks@0
displayName: Configure Databricks CLI
inputs:
url: $(databricks_url)
token: $(sm-dbw-access-token)
#===== Deploy Notebooks to Workspace =====
- task: deploynotebooks@0
displayName: Deploy Notebooks to Workspace
inputs:
notebooksFolderPath: $(Pipeline.Workspace)/drop/notebook
workspaceFolder: /NOTEBOOKS/etl1

error:
Starting: Deploy Notebooks to Workspace

Task : Deploy Databricks Notebooks
Description : Recursively deploys Notebooks from given folder to a Databricks Workspace
Version : 0.5.6
Author : Microsoft DevLabs
Help :

/opt/hostedtoolcache/Python/3.10.6/x64/bin/python -V
Python 3.10.6
Version: 3.10.6

Python3 selected. Running...
/opt/hostedtoolcache/Python/3.10.6/x64/bin/databricks workspace import_dir -o --profile AZDO /home/vsts/work/1/drop/notebook /NOTEBOOKS/etl1
Error: AttributeError: type object 'Retry' has no attribute 'DEFAULT_METHOD_WHITELIST'

##[error]The Notebooks import process failed.
Finishing: Deploy Notebooks to Workspace

We used following workaround to mitigate issue.
databricks/databricks-cli#634

We also think issue could also be caused by
urllib3/urllib3@ba59347

Can you please confirm if this is causing error and provide us solution to avoid that workaround. So we can have permanent fix for the issue.

Improve user experience to choose between new and existing cluster

Maybe add the Radio Button option and, if checked "Use existing", show the text box to add the cluster id.

install scala is broken - bintray no longer exists

scala sbt task throws an error:

Hit:1 http://azure.archive.ubuntu.com/ubuntu focal InRelease
Hit:2 http://azure.archive.ubuntu.com/ubuntu focal-updates InRelease
Hit:3 http://azure.archive.ubuntu.com/ubuntu focal-backports InRelease
Hit:4 http://azure.archive.ubuntu.com/ubuntu focal-security InRelease
Hit:5 https://packages.microsoft.com/ubuntu/20.04/prod focal InRelease
Hit:6 http://ppa.launchpad.net/ubuntu-toolchain-r/test/ubuntu focal InRelease
Ign:7 https://dl.bintray.com/sbt/debian InRelease
Err:8 https://dl.bintray.com/sbt/debian Release
Could not handshake: The TLS connection was non-properly terminated. [IP: 3.122.203.194 443]
Reading package lists...
E: The repository 'https://dl.bintray.com/sbt/debian Release' does not have a Release file.

sudo DEBIAN_FRONTEND=noninteractive apt-get install sbt -y

Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package sbt
##[error]Bash exited with code 100
Finishing: Running installscalatools

would be better to install via:
echo "deb https://repo.scala-sbt.org/scalasbt/debian all main" | sudo tee /etc/apt/sources.list.d/sbt.list
echo "deb https://repo.scala-sbt.org/scalasbt/debian /" | sudo tee /etc/apt/sources.list.d/sbt_old.list
curl -sL "https://keyserver.ubuntu.com/pks/lookup?op=get&search=0x2EE0EA64E40A89B84B2DF73499E82A75642AC823" | sudo apt-key add
sudo apt-get update
sudo apt-get install sbt

scala examples

First of all, thanks for the great library!

However, are there any example how to use this in Azure Pipelines?

e.g. if I want to install Scala Jar on a databricks cluster? what should the arguments precisely look like? thanks

task: installscalatools@0
inputs:
clusterID: 'some-cluster-id'
displayName: 'Running installscalatools'

Execute Notebook task creates new job on every execution

When running notebooks using the Execute Notebook task a job is created for every execution. I would expect the operation to be create (or update existing) job.

Over time this will bloat the job list in databricks.

Support of azdo-databricks repo

Is this repo currently being supported? The last commit was 10 months ago, last PR completed July, 2019. There are some updates to cluster versions required to ensure the created AzDo jobs continue to function and are secure. #23

How to Deploy Workflows, Jobs, Delta live tables and Cluster Policies using AzDo-Databricks task

Hi Team,

Can you please guide how to Deploy Databricks Workflows, Jobs, Delta live tables and Cluster Policies using AzDo-Databricks task across multiple workspaces of higher environments such as QA, UAT and Prod etc.

Support for Increment Deploy

We ware looking for deploying only notebooks that changed i.e. incremental deploy does the extension support incremental deploys?

This repo is missing important files

There are important files that Microsoft projects should all have that are not present in this repository. A pull request has been opened to add the missing file(s). When the pr is merged this issue will be closed automatically.

Microsoft teams can learn more about this effort and share feedback within the open source guidance available internally.

Merge this pull request

Bug in Start Cluster

Issue: If an invalid cluster id is sent to Start Cluster, the task does not exit with an error. The error shows in the log, but then get's hung in a retry loop.

Steps to reproduce:

Create or use an existing Databricks cluster.
Add the Configure Databricks CLI task with correct settings.
Add the Start Cluster task, setting an invalid cluster id (lorumipsum, for example)
Run the build or release.

It should produce a log like this, never exiting until the task for force-cancelled:

2019-09-11T22:19:32.5433396Z ##[section]Starting: Starting Cluster lorumipsum
2019-09-11T22:19:32.5539474Z ==============================================================================
2019-09-11T22:19:32.5539630Z Task         : Start a Databricks Cluster
2019-09-11T22:19:32.5539723Z Description  : Make sure a Databricks Cluster is started
2019-09-11T22:19:32.5539823Z Version      : 0.5.2
2019-09-11T22:19:32.5539898Z Author       : Microsoft DevLabs
2019-09-11T22:19:32.5539996Z Help         : 
2019-09-11T22:19:32.5540104Z ==============================================================================
2019-09-11T22:19:37.5287181Z parse error: Invalid numeric literal at line 1, column 6
2019-09-11T22:19:37.5382200Z Cluster lorumipsum not running, turning on...
2019-09-11T22:19:38.3798571Z Error: b'{"error_code":"INVALID_PARAMETER_VALUE","message":"Cluster lorumipsum does not exist"}'
2019-09-11T22:19:39.2306530Z parse error: Invalid numeric literal at line 1, column 6
2019-09-11T22:20:09.5652770Z Starting...
2019-09-11T22:20:10.2523779Z parse error: Invalid numeric literal at line 1, column 6
2019-09-11T22:20:40.3022300Z Starting...
2019-09-11T22:20:41.2190210Z parse error: Invalid numeric literal at line 1, column 6
2019-09-11T22:21:11.2460626Z Starting...
2019-09-11T22:21:12.1989743Z parse error: Invalid numeric literal at line 1, column 6
2019-09-11T22:21:24.5662474Z ##[error]The operation was canceled.
2019-09-11T22:21:24.5667493Z ##[section]Finishing: Starting Cluster lorumipsum

Notebook Execution Fails if Notebook Path contains a Space

Notebook Execution does fail if the Notebook Path Contains a Space - or its unclear how to Escape correctly.
Because when I try to put the Notebook path into quotes the Error is that the path has to start with a forward slash..

2020-05-08T09:01:32.7176503Z Python3 selected. Running...
2020-05-08T09:01:32.7299400Z ##[error]The Notebook path must start with a forward slash (/).
2020-05-08T09:01:32.7378657Z ##[section]Finishing: Execute "00 Current Release executable"

020-05-08T08:57:39.7170139Z Python3 selected. Running...
2020-05-08T08:57:39.7334767Z [command]C:\hostedtoolcache\windows\Python\3.7.6\x64\Scripts\databricks.exe workspace ls /00 Workspace Setup - overwritten during deployment/01 Setup General/10 Release Scripts --profile AZDO
2020-05-08T08:57:40.7001318Z Error: b'{"error_code":"RESOURCE_DOES_NOT_EXIST","message":"Path (/00) doesn't exist."}'
2020-05-08T08:57:40.7001929Z
2020-05-08T08:57:40.7035981Z ##[error]Error while fetching Databricks workspace folder.
2020-05-08T08:57:40.7047074Z Checking if 00 Current Release executable existis under /00 Workspace Setup - overwritten during deployment/01 Setup General/10 Release Scripts...
2020-05-08T08:57:40.7048045Z Notebook: Error: b'{"error_code":"RESOURCE_DOES_NOT_EXIST","message":"Path (/00) doesn't exist."}'
2020-05-08T08:57:40.7048604Z Notebook:

It works fine when the Notebook path does not contain the Space character.

the deploy task does't delete files when is delete from the repo

Error on task to deploy the notebooks to workspace

I'm getting an error deploying the notebooks to workspace, Python version: 3.7.4.
Agent job 1: "The Notebooks import process failed."

Python version that worked 3.7.2

New Cluster scripts require updates

In order to support later features such as Databricks Delta, and be compatible with long term support, the cluster should be upgraded. Upgrade the new job cluster script or provide an option to select cluster version as part of the execution?

https://github.com/microsoft/azdo-databricks/blob/master/tasks/ExecuteNotebook/ExecuteNotebookV1/job-configuration/new-cluster.json

https://docs.databricks.com/release-notes/runtime/releases.html

6.4 should have Delta features.

Configure Databricks CLI : Unable to locate executable file: 'python'

Hello Experts,

I am creating a Azure DevOps release pipeline to deploy Databricks notebooks. Using Configure Databricks CLI and Databricks Notebooks deployments, but Configure Databricks CLI is failing with below error.

##[error]Unhandled: Unable to locate executable file: 'python'. Please verify either the file path exists or the file can be found within a directory specified by the PATH environment variable. Also verify the file has a valid extension for an executable file.
##[debug]Processed: ##vso[task.issue type=error;]Unhandled: Unable to locate executable file: 'python'. Please verify either the file path exists or the file can be found within a directory specified by the PATH environment variable. Also verify the file has a valid extension for an executable file.

Used user Python version task at the beginning but did not resolve my issue. Please suggest?

Unable to locate executable file: 'python' on self hosted agent even though Python is installed on it.

I am using 'Deploy Databricks Notebooks' task in release pipeline and pre-requisites is to have python installed on self hosted agent. 'Use Python version' task is running fine and showing logs as Prepending PATH environment variable with directory: C:\hostedtoolcache\windows\Python\3.8.0\x64. but post this when 'Deploy Databricks Notebooks' task run it throws below error: 'Unhandled: Unable to locate executable file: 'python'. Please verify either the file path exists or the file can be found within a directory specified by the PATH environment variable. Also verify the file has a valid extension for an executable file.' Azure Devops self hosted agent is running on windows server 2016 64-bit OS.

max_retries

By default the execute Databricks notebook task applies a default retry of 1 , is it possible to change this ? I would like to set it to 0 for some of my executions.

Wait task hangs indefinitely if executing non-existent notebook

If you run the executenotebook task supplying a notebookPath to a notebook that doesn't exist, it succeeds (expected behaviour: fail).

If you then run the waitexecution task, the task hangs indefinitely (expected behaviour: fail), as it doesn't seem to understand the error message:

  "state": {
    "life_cycle_state": "INTERNAL_ERROR",
    "state_message": "Notebook not found: /NotebookDoesNotExist"
  },

Enhancement request for new tasks

Hi, this is a great extension.

Please consider adding tasks for the functions below. We are currently doing these with Powershell scripts.

Create Cluster
Create Secret Scope
Create Secret
Delete Cluster
Get Cluster Id from Cluster Name
Remove Workspace
Stop Cluster
Attach Libraries

Best regards,
Andrew Stephens

Deploy Notebooks within a Directory

The Deploy Databricks Notebook task has an error when the notebooks are within multiple directories/folders. Is there a way to bypass this error, or will this be a feature that needs to be added?

Thank you!

Execute Notebook - Creating the job is failing on Windows

https://dev.azure.com/serradas-msft/DevOps%20for%20Databricks/_apps/hub/ms.vss-releaseManagement-web.cd-release-progress?_a=release-environment-logs&releaseId=64&environmentId=64

2019-06-18T14:14:00.6507071Z [command]C:\hostedtoolcache\windows\Python\3.7.3\x64\Scripts\databricks.exe jobs create --json-file D:\a\_tasks\executenotebook_ac263826-c64e-4f2d-b7ce-5f7e777fd8bc\0.2.23\job-configuration.json --profile AZDO
2019-06-18T14:14:00.9427925Z Error: JSONDecodeError: Invalid \escape: line 7 column 27 (char 181)
2019-06-18T14:14:00.9431659Z ##[debug]task result: Failed
2019-06-18T14:14:00.9488079Z ##[error]Databricks Job creation failed with
2019-06-18T14:14:00.9495918Z ##[debug]Processed: ##vso[task.issue type=error;]Databricks Job creation failed with
2019-06-18T14:14:00.9496514Z ##[debug]Processed: ##vso[task.complete result=Failed;]Databricks Job creation failed with
2019-06-18T14:14:00.9496774Z ##[debug]task result: Failed
2019-06-18T14:14:00.9497053Z ##[error]The job creation failed.
2019-06-18T14:14:00.9497187Z ##[debug]Processed: ##vso[task.issue type=error;]The job creation failed.
2019-06-18T14:14:00.9497431Z ##[debug]Processed: ##vso[task.complete result=Failed;]The job creation failed.

microsoft / azdo-databricks Goto Github PK

azdo-databricks's Introduction

DevOps for Databricks extension

Pre-requisites

Use Python Version

Supported Hosted Agents

Pipeline Tasks

Configure Databricks CLI

Supported Agents

Important: What is done with your Databricks PAT?

Store your PAT as a variable

Supported Agents

Deploy Notebooks to Workspace

Parameters

Supported Agents

Execute $(notebookPath)

Supported Agents

Wait for Notebook execution

Start a Databrick Cluster (new!)

Parameters

Supported Agents

Install Scala Tools (new!)

Supported Agents

Install Spark

Supported Agents

Compiles and Installs JAR using SBT (new!)

Parameters

Supported Agents

Known issues

Release Notes

Contributing

azdo-databricks's People

Contributors

Stargazers

Watchers

Forkers

azdo-databricks's Issues

error: Starting: Deploy Notebooks to Workspace

Task : Deploy Databricks Notebooks Description : Recursively deploys Notebooks from given folder to a Databricks Workspace Version : 0.5.6 Author : Microsoft DevLabs Help :

sudo DEBIAN_FRONTEND=noninteractive apt-get install sbt -y

Recommend Projects

Recommend Topics

Recommend Org

Jobs

error:
Starting: Deploy Notebooks to Workspace

Task : Deploy Databricks Notebooks
Description : Recursively deploys Notebooks from given folder to a Databricks Workspace
Version : 0.5.6
Author : Microsoft DevLabs
Help :