This repository provides Databricks Asset Bundles examples, with added CI/CD pipelines using Azure Devops
For more details, see the READMEs in each subfolder, e.g. default_python README
To learn more, see:
- The public preview announcement at https://www.databricks.com/blog/announcing-public-preview-databricks-asset-bundles-apply-software-development-best-practices
- The docs at https://docs.databricks.com/dev-tools/bundles/index.html
CI pipeline definitions are in: default_python/azure_devops_pipelines/
Description:
- Engineer create a new branch using their IDE of choice, or the Databricks Repos UI
- Engineer makes code changes, runs unit tests locally and integration tests against the dev workspace
- Once the feature is complete, the engineer creates a Pull Request in Azure Devops Repos
- Azure Devops Pipelines automatically triggers a run of the Pull Requests CI pipeline. This runs tests and deploys the Databricks Asset Bundle (DAB) to the development environment
- Once all automated checks have been completed and pull request has been approved, the engineer completes the pull request to merge the code into the main branch
- Azure Devops Pipelines automatically triggers a run of the Staging CI pipeline. This tests the code again, but against the Staging environment, which likely has more realistic production data and config. It deploys the DAB to the environment
- (Optional) The engineer wants to release the code to the production environment, so creates a new Pull Request to merge all the code from the main branch into the release branch
- Once the Pull Request has been reviewed, it is completed and the code is merged into the release branch
- Azure Devopis Pipelines automatically triggers the Prod CI pipeline, deploying the DAB to production, this typically includes a scheduled trigger for the job to run at a given time
- Create a service principle in Azure Entra ID or in Databricks directly if you don't have SCIM set up. See https://learn.microsoft.com/en-us/azure/databricks/admin/users-groups/service-principals
Alternatively, you can use a personal access token from Databricks instead. Change the environment variables in the pipeline files and Azure Devops variable group appropriately.
- Go to Azure Pipelines
- Click 'Library'
- Create new variable group
- Name it
dev-variable-group
- Add the following variables:
BUNDLE_VAR_notifications_email_address
: Optional email address to use for failure notificationsDATABRICKS_CLIENT_ID
: Service Principle client id used to authenticate with DatabricksDATABRICKS_CLIENT_SECRET
: Service Principle secret use to authenticate with Databricks. Set this to secret to avoid it displaying in the UIDATABRICKS_CLUSTER_ID
: Used by DBConnect to run automated tests against Databricks interactive clusterDATABRICKS_HOST
: Databricks host used by CLI and tests, e.g. https://demo-workspace.cloud.databricks.com/
- Clone this variable group for staging and prod, call these
staging-variable-group
andprod-variable-group
. Change values accordingly.
- Go to Azure Pipelines
- Click 'New Pipeline'
- Select Azure Repos Git and then this Git repo
- Select Existing Azure Pipelines YAML file
- Select main branch and default_python/azure_devops_pipelines/azure_pipeline_pull_request.yml
- Run pipeline
- Repeat steps for the staging and production CI pipelines