GithubHelp home page GithubHelp logo

databricks-asset-bundles-dais2023's People

Contributors

pietern avatar rafikurlansik avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

databricks-asset-bundles-dais2023's Issues

Upload jar library without syncing other files/folders

Hi,

I would like to upload an existing JAR as a dependent library to a job/workflow without having to sync any other files/folders.
Currently, all files/folders are always synchronized, but I don't want to sync these. I only need the jar in the target/scala-2.12 folder.

sync:
  include:
    - target/scala-2.12/*.jar

Folder structure:

.
├── README.md
├── build.sbt
├── databricks.yml
├── src
│   └── main
│       ├── resources
│       │   └── ...
│       └── scala
│           └── ...
└── target
    ├── global-logging
    ├── scala-2.12
        └── xxxxxxxxx-assembly-x.x.x.jar

With dbx, this was possible by using file references.
What is the recommended way to do this via DAB, without syncing other files/folders?

I expected this to be possible via artifacts, but that seems to be (for now?) only intended for Python wheels.

Not able to run 'databricks bundle deploy' command

Hello,

When I try to run the 'databricks bundle deploy' command I get the following error:

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x1 addr=0x30 pc=0xf457db]

goroutine 1 [running]:
github.com/databricks/cli/bundle/config.(*Resources).SetConfigFilePath(...)
github.com/databricks/cli/bundle/config/resources.go:118
github.com/databricks/cli/bundle/config.(*Root).SetConfigFilePath(0xc000406840, {0xc000532800, 0x86})
github.com/databricks/cli/cmd/bundle/variables.go:11 +0x29
github.com/spf13/cobra.(*Command).execute(0xc000026f00, {0x1e67d40, 0x0, 0x0})
github.com/spf13/[email protected]/command.go:925 +0x7f6
github.com/spf13/cobra.(*Command).ExecuteC(0xc000004900)
github.com/spf13/[email protected]/command.go:1068 +0x3a5
github.com/spf13/cobra.(*Command).ExecuteContextC(...)
github.com/spf13/[email protected]/command.go:1001
github.com/databricks/cli/cmd/root.Execute(0x1690ff8?)
github.com/databricks/cli/cmd/root/root.go:99 +0x5b
main.main()
github.com/databricks/cli/main.go:11 +0x2a

I checked my databricks cli version and it is Databricks CLI v0.206.0, so that shouldn't be the issue.

Kind regards,
Vincent

Failure to re-deploy 'files' directory after delete

Hello,

I managed to create a bundle and deploy it into a dev environment using the 'databricks bundle deploy' command in Visual Studio. This command worked without any issues the first time I used it and deployed everything correctly. After the first deploy I wanted to test what would happen if I deleted the deployed .bundle folder from my Workspace Shared folder and run the deploy again. I ran the 'databricks bundled deploy' command again without errors, but when I checked the Workspace the '.bundle/dev/DatabricksDreamTeamBundle/files' was deployed but completely empty...

Running the deploys to stg and prod environments using a Azure DevOps Build Pipelines worked without issues and did not show this behaviour (I did the exact same test there), so I wonder why deploying via Visual Studio Code is giving different results. Has anyone seen this happen and know what can be causing this?
Below is the bundle.yaml file I'm using:

bundle:
name: DatabricksDreamTeamBundle

workspace:
host: https://adb-xxxxxxxxxx.x.azuredatabricks.net/
root_path: /Shared/.bundle/${bundle.environment}/${bundle.name}

resources:

pipelines:
# A DLT pipeline processed the data from bronze to gold
sales_pipeline:
name: "[${bundle.environment}] DLT Sales"
target: "sales_${bundle.environment}"
libraries:
- notebook:
path: "./20_bronze/sales_bronze.py"
- notebook:
path: "./30_silver/sales_silver.py"
- notebook:
path: "./40_gold/sales_gold.py"
channel: preview

jobs:
# A two-task Databricks Workflow - Ingestion + DLT pipeline
sales_job:
name: "[${bundle.environment}] Job Sales"
tasks:
- task_key: "${bundle.environment}sales_ingestion_notebook"
notebook_task:
notebook_path: "./10_ingestion/sales_ingestion
${bundle.environment}.py"
new_cluster:
spark_version: 13.1.x-scala2.12
num_workers: 1
node_type_id: Standard_DS3_v2
- task_key: dlt_sales_pipeline
pipeline_task:
pipeline_id: ${resources.pipelines.sales_pipeline.id}
depends_on:
- task_key: "${bundle.environment}_sales_ingestion_notebook"

targets:
dev:
default: true
resources:
pipelines:
sales_pipeline:
development: true

stg:
workspace:
host: https://adb-xxxxxxxxxxxx.xx.azuredatabricks.net/ environment
resources:
pipelines:
sales_pipeline:
name: "[${bundle.environment}] DLT Sales"
target: "sales_${bundle.environment}"
channel: preview
libraries:
#Adding a Notebook to the DLT pipeline that tests the data
- notebook:
path: "./50_tests/10_integration/DLT-Pipeline-Test.py"
development: true

prod:
workspace:
host: https://adb-xxxxxxxxxxxx.xx.azuredatabricks.net/ environment
resources:
pipelines:
sales_pipeline:
development: false
#Update the cluster settings of the DLT pipeline
clusters:
- autoscale:
min_workers: 1
max_workers: 2

pip reininstall is executed in every single task

I am trying to convert a current dbx project to bundles.
I have some tasks of type python_wheel_task.

One such tasks looks like this (they're all similar):

        - task_key: "data_raw"
          depends_on:
            - task_key: "process_init"
          job_cluster_key: "somejobcluster"
          python_wheel_task:
            package_name: "myproject"
            entry_point: "data_raw"
          libraries:
            - whl: ./dist/myproject-*.whl

and I have defined the following artifact:

    artifacts:
      the_wheel:
        type: whl
        path: .
        build: poetry build

In dbx, the wheel would be installed once on the job-cluster.
Now I noticed that every task is converted to a notebook that contains the following code:

%pip install --force-reinstall /Workspace/Shared/dbx/projects/myproject/.internal/.../myproject-0.0.0-py3-none-any.whl

This seems rather wasteful of running time if you have many tasks that do small things on the same cluster.

Am I missing a setting, or is this done by design?

files are not deleted when renamed

We did some renaming to the files that are uploaded to the workspace and it left the old files there and added the new ones. Is there a way to force it to keep the files in sync and remove the old ones so we don't end up with a mess in the workspace over time as changes are made?

Error: terraform init: exit status 1

Getting error while deployment DAB. below is the error msg.

Starting upload of bundle files
Uploaded bundle files at /Users/*****/.bundle/hello-bundle/development/files!

Starting resource deployment
Error: terraform init: exit status 1

Error: Failed to query available provider packages

Could not retrieve the list of available versions for provider
databricks/databricks: could not connect to registry.terraform.io: failed to
request discovery document: Get
"https://registry.terraform.io/.well-known/terraform.json": proxyconnect tcp:
dial tcp: lookup sub.proxy.***.com: getaddrinfow: A non-recoverable error
occurred during a database lookup.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.