License: Other

Python 94.09% Shell 5.91%

databricks-asset-bundles-dais2023's People

Contributors

Stargazers

Watchers

databricks-asset-bundles-dais2023's Issues

Updating Workflow Action YAMLs to use Node.js 20

Getting this error when running the Actions:
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/checkout@v3. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.

Easily fixed by adding this code to each GitHub action yaml

Upload jar library without syncing other files/folders

Hi,

I would like to upload an existing JAR as a dependent library to a job/workflow without having to sync any other files/folders.
Currently, all files/folders are always synchronized, but I don't want to sync these. I only need the jar in the target/scala-2.12 folder.

sync:
  include:
    - target/scala-2.12/*.jar

Folder structure:

.
├── README.md
├── build.sbt
├── databricks.yml
├── src
│   └── main
│       ├── resources
│       │   └── ...
│       └── scala
│           └── ...
└── target
    ├── global-logging
    ├── scala-2.12
        └── xxxxxxxxx-assembly-x.x.x.jar

With dbx, this was possible by using file references.
What is the recommended way to do this via DAB, without syncing other files/folders?

I expected this to be possible via artifacts, but that seems to be (for now?) only intended for Python wheels.

Not able to run 'databricks bundle deploy' command

Hello,

When I try to run the 'databricks bundle deploy' command I get the following error:

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x1 addr=0x30 pc=0xf457db]

goroutine 1 [running]:
github.com/databricks/cli/bundle/config.(*Resources).SetConfigFilePath(...)
github.com/databricks/cli/bundle/config/resources.go:118
github.com/databricks/cli/bundle/config.(*Root).SetConfigFilePath(0xc000406840, {0xc000532800, 0x86})
github.com/databricks/cli/cmd/bundle/variables.go:11 +0x29
github.com/spf13/cobra.(*Command).execute(0xc000026f00, {0x1e67d40, 0x0, 0x0})
github.com/spf13/[email protected]/command.go:925 +0x7f6
github.com/spf13/cobra.(*Command).ExecuteC(0xc000004900)
github.com/spf13/[email protected]/command.go:1068 +0x3a5
github.com/spf13/cobra.(*Command).ExecuteContextC(...)
github.com/spf13/[email protected]/command.go:1001
github.com/databricks/cli/cmd/root.Execute(0x1690ff8?)
github.com/databricks/cli/cmd/root/root.go:99 +0x5b
main.main()
github.com/databricks/cli/main.go:11 +0x2a

I checked my databricks cli version and it is Databricks CLI v0.206.0, so that shouldn't be the issue.

Kind regards,
Vincent

Failure to re-deploy 'files' directory after delete

Hello,

I managed to create a bundle and deploy it into a dev environment using the 'databricks bundle deploy' command in Visual Studio. This command worked without any issues the first time I used it and deployed everything correctly. After the first deploy I wanted to test what would happen if I deleted the deployed .bundle folder from my Workspace Shared folder and run the deploy again. I ran the 'databricks bundled deploy' command again without errors, but when I checked the Workspace the '.bundle/dev/DatabricksDreamTeamBundle/files' was deployed but completely empty...

Running the deploys to stg and prod environments using a Azure DevOps Build Pipelines worked without issues and did not show this behaviour (I did the exact same test there), so I wonder why deploying via Visual Studio Code is giving different results. Has anyone seen this happen and know what can be causing this?
Below is the bundle.yaml file I'm using:

bundle:
name: DatabricksDreamTeamBundle

workspace:
host: https://adb-xxxxxxxxxx.x.azuredatabricks.net/
root_path: /Shared/.bundle/${bundle.environment}/${bundle.name}

resources:

pipelines:
# A DLT pipeline processed the data from bronze to gold
sales_pipeline:
name: "[${bundle.environment}] DLT Sales"
target: "sales_${bundle.environment}"
libraries:
- notebook:
path: "./20_bronze/sales_bronze.py"
- notebook:
path: "./30_silver/sales_silver.py"
- notebook:
path: "./40_gold/sales_gold.py"
channel: preview

jobs:
# A two-task Databricks Workflow - Ingestion + DLT pipeline
sales_job:
name: "[${bundle.environment}] Job Sales"
tasks:
- task_key: "${bundle.environment}sales_ingestion_notebook"
notebook_task:
notebook_path: "./10_ingestion/sales_ingestion${bundle.environment}.py"
new_cluster:
spark_version: 13.1.x-scala2.12
num_workers: 1
node_type_id: Standard_DS3_v2
- task_key: dlt_sales_pipeline
pipeline_task:
pipeline_id: ${resources.pipelines.sales_pipeline.id}
depends_on:
- task_key: "${bundle.environment}_sales_ingestion_notebook"

targets:
dev:
default: true
resources:
pipelines:
sales_pipeline:
development: true

stg:
workspace:
host: https://adb-xxxxxxxxxxxx.xx.azuredatabricks.net/ environment
resources:
pipelines:
sales_pipeline:
name: "[${bundle.environment}] DLT Sales"
target: "sales_${bundle.environment}"
channel: preview
libraries:
#Adding a Notebook to the DLT pipeline that tests the data
- notebook:
path: "./50_tests/10_integration/DLT-Pipeline-Test.py"
development: true

prod:
workspace:
host: https://adb-xxxxxxxxxxxx.xx.azuredatabricks.net/ environment
resources:
pipelines:
sales_pipeline:
development: false
#Update the cluster settings of the DLT pipeline
clusters:
- autoscale:
min_workers: 1
max_workers: 2

pip reininstall is executed in every single task

I am trying to convert a current dbx project to bundles.
I have some tasks of type python_wheel_task.

One such tasks looks like this (they're all similar):

        - task_key: "data_raw"
          depends_on:
            - task_key: "process_init"
          job_cluster_key: "somejobcluster"
          python_wheel_task:
            package_name: "myproject"
            entry_point: "data_raw"
          libraries:
            - whl: ./dist/myproject-*.whl

and I have defined the following artifact:

    artifacts:
      the_wheel:
        type: whl
        path: .
        build: poetry build

In dbx, the wheel would be installed once on the job-cluster.
Now I noticed that every task is converted to a notebook that contains the following code:

%pip install --force-reinstall /Workspace/Shared/dbx/projects/myproject/.internal/.../myproject-0.0.0-py3-none-any.whl

This seems rather wasteful of running time if you have many tasks that do small things on the same cluster.

Am I missing a setting, or is this done by design?

[FR] Add ${bundles.git.tag} to Git Settings

I would like to be able to tag my resources deployed through Databricks Asset Bundles with the git release tag. Currently only origin_url, branch, and commit are available according to the docs

files are not deleted when renamed

We did some renaming to the files that are uploaded to the workspace and it left the old files there and added the new ones. Is there a way to force it to keep the files in sync and remove the old ones so we don't end up with a mess in the workspace over time as changes are made?

Error: terraform init: exit status 1

Getting error while deployment DAB. below is the error msg.

Starting upload of bundle files
Uploaded bundle files at /Users/*****/.bundle/hello-bundle/development/files!

Starting resource deployment
Error: terraform init: exit status 1

Error: Failed to query available provider packages

Could not retrieve the list of available versions for provider
databricks/databricks: could not connect to registry.terraform.io: failed to
request discovery document: Get
"https://registry.terraform.io/.well-known/terraform.json": proxyconnect tcp:
dial tcp: lookup sub.proxy.***.com: getaddrinfow: A non-recoverable error
occurred during a database lookup.

databricks / databricks-asset-bundles-dais2023 Goto Github PK

databricks-asset-bundles-dais2023's People

Contributors

Stargazers

Watchers

Forkers

databricks-asset-bundles-dais2023's Issues

Updating Workflow Action YAMLs to use Node.js 20

Upload jar library without syncing other files/folders

Not able to run 'databricks bundle deploy' command

Failure to re-deploy 'files' directory after delete

pip reininstall is executed in every single task

[FR] Add ${bundles.git.tag} to Git Settings

files are not deleted when renamed

Error: terraform init: exit status 1

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs