databricks / databricks-asset-bundles-dais2023 Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
Getting this error when running the Actions:
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/checkout@v3. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
Hi,
I would like to upload an existing JAR as a dependent library to a job/workflow without having to sync any other files/folders.
Currently, all files/folders are always synchronized, but I don't want to sync these. I only need the jar in the target/scala-2.12 folder.
sync:
include:
- target/scala-2.12/*.jar
Folder structure:
.
├── README.md
├── build.sbt
├── databricks.yml
├── src
│ └── main
│ ├── resources
│ │ └── ...
│ └── scala
│ └── ...
└── target
├── global-logging
├── scala-2.12
└── xxxxxxxxx-assembly-x.x.x.jar
With dbx, this was possible by using file references.
What is the recommended way to do this via DAB, without syncing other files/folders?
I expected this to be possible via artifacts, but that seems to be (for now?) only intended for Python wheels.
Hello,
When I try to run the 'databricks bundle deploy' command I get the following error:
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x1 addr=0x30 pc=0xf457db]
goroutine 1 [running]:
github.com/databricks/cli/bundle/config.(*Resources).SetConfigFilePath(...)
github.com/databricks/cli/bundle/config/resources.go:118
github.com/databricks/cli/bundle/config.(*Root).SetConfigFilePath(0xc000406840, {0xc000532800, 0x86})
github.com/databricks/cli/cmd/bundle/variables.go:11 +0x29
github.com/spf13/cobra.(*Command).execute(0xc000026f00, {0x1e67d40, 0x0, 0x0})
github.com/spf13/[email protected]/command.go:925 +0x7f6
github.com/spf13/cobra.(*Command).ExecuteC(0xc000004900)
github.com/spf13/[email protected]/command.go:1068 +0x3a5
github.com/spf13/cobra.(*Command).ExecuteContextC(...)
github.com/spf13/[email protected]/command.go:1001
github.com/databricks/cli/cmd/root.Execute(0x1690ff8?)
github.com/databricks/cli/cmd/root/root.go:99 +0x5b
main.main()
github.com/databricks/cli/main.go:11 +0x2a
I checked my databricks cli version and it is Databricks CLI v0.206.0, so that shouldn't be the issue.
Kind regards,
Vincent
Hello,
I managed to create a bundle and deploy it into a dev environment using the 'databricks bundle deploy' command in Visual Studio. This command worked without any issues the first time I used it and deployed everything correctly. After the first deploy I wanted to test what would happen if I deleted the deployed .bundle folder from my Workspace Shared folder and run the deploy again. I ran the 'databricks bundled deploy' command again without errors, but when I checked the Workspace the '.bundle/dev/DatabricksDreamTeamBundle/files' was deployed but completely empty...
Running the deploys to stg and prod environments using a Azure DevOps Build Pipelines worked without issues and did not show this behaviour (I did the exact same test there), so I wonder why deploying via Visual Studio Code is giving different results. Has anyone seen this happen and know what can be causing this?
Below is the bundle.yaml file I'm using:
bundle:
name: DatabricksDreamTeamBundle
workspace:
host: https://adb-xxxxxxxxxx.x.azuredatabricks.net/
root_path: /Shared/.bundle/${bundle.environment}/${bundle.name}
resources:
pipelines:
# A DLT pipeline processed the data from bronze to gold
sales_pipeline:
name: "[${bundle.environment}] DLT Sales"
target: "sales_${bundle.environment}"
libraries:
- notebook:
path: "./20_bronze/sales_bronze.py"
- notebook:
path: "./30_silver/sales_silver.py"
- notebook:
path: "./40_gold/sales_gold.py"
channel: preview
jobs:
# A two-task Databricks Workflow - Ingestion + DLT pipeline
sales_job:
name: "[${bundle.environment}] Job Sales"
tasks:
- task_key: "${bundle.environment}sales_ingestion_notebook"
notebook_task:
notebook_path: "./10_ingestion/sales_ingestion${bundle.environment}.py"
new_cluster:
spark_version: 13.1.x-scala2.12
num_workers: 1
node_type_id: Standard_DS3_v2
- task_key: dlt_sales_pipeline
pipeline_task:
pipeline_id: ${resources.pipelines.sales_pipeline.id}
depends_on:
- task_key: "${bundle.environment}_sales_ingestion_notebook"
targets:
dev:
default: true
resources:
pipelines:
sales_pipeline:
development: true
stg:
workspace:
host: https://adb-xxxxxxxxxxxx.xx.azuredatabricks.net/ environment
resources:
pipelines:
sales_pipeline:
name: "[${bundle.environment}] DLT Sales"
target: "sales_${bundle.environment}"
channel: preview
libraries:
#Adding a Notebook to the DLT pipeline that tests the data
- notebook:
path: "./50_tests/10_integration/DLT-Pipeline-Test.py"
development: true
prod:
workspace:
host: https://adb-xxxxxxxxxxxx.xx.azuredatabricks.net/ environment
resources:
pipelines:
sales_pipeline:
development: false
#Update the cluster settings of the DLT pipeline
clusters:
- autoscale:
min_workers: 1
max_workers: 2
I am trying to convert a current dbx project to bundles.
I have some tasks of type python_wheel_task
.
One such tasks looks like this (they're all similar):
- task_key: "data_raw"
depends_on:
- task_key: "process_init"
job_cluster_key: "somejobcluster"
python_wheel_task:
package_name: "myproject"
entry_point: "data_raw"
libraries:
- whl: ./dist/myproject-*.whl
and I have defined the following artifact:
artifacts:
the_wheel:
type: whl
path: .
build: poetry build
In dbx, the wheel would be installed once on the job-cluster.
Now I noticed that every task is converted to a notebook that contains the following code:
%pip install --force-reinstall /Workspace/Shared/dbx/projects/myproject/.internal/.../myproject-0.0.0-py3-none-any.whl
This seems rather wasteful of running time if you have many tasks that do small things on the same cluster.
Am I missing a setting, or is this done by design?
I would like to be able to tag my resources deployed through Databricks Asset Bundles with the git release tag. Currently only origin_url, branch, and commit are available according to the docs
We did some renaming to the files that are uploaded to the workspace and it left the old files there and added the new ones. Is there a way to force it to keep the files in sync and remove the old ones so we don't end up with a mess in the workspace over time as changes are made?
Getting error while deployment DAB. below is the error msg.
Starting upload of bundle files
Uploaded bundle files at /Users/*****/.bundle/hello-bundle/development/files!
Starting resource deployment
Error: terraform init: exit status 1
Error: Failed to query available provider packages
Could not retrieve the list of available versions for provider
databricks/databricks: could not connect to registry.terraform.io: failed to
request discovery document: Get
"https://registry.terraform.io/.well-known/terraform.json": proxyconnect tcp:
dial tcp: lookup sub.proxy.***.com: getaddrinfow: A non-recoverable error
occurred during a database lookup.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.