Sample YAML File (Cookie-Cutter):
custom:
basic-cluster-props: &basic-cluster-props
spark_version: "9.1.x-cpu-ml-scala2.12"
basic-static-cluster: &basic-static-cluster
new_cluster:
<<: *basic-cluster-props
num_workers: 1
node_type_id: "i3.xlarge"
# please note that we're using FUSE reference for config file, hence we're going to load this file using its local FS path
environments:
default:
strict_path_adjustment_policy: true
jobs:
- name: "test-dbx-sample"
<<:
- *basic-static-cluster
spark_python_task:
python_file: "file://dbx_package/jobs/sample/entrypoint.py"
parameters: ["--conf-file", "file:fuse://conf/test/sample.yml"]
- name: "test-dbx-sample-integration-test"
<<:
- *basic-static-cluster
spark_python_task:
python_file: "file://tests/integration/sample_test.py"
parameters: ["--conf-file", "file:fuse://conf/test/sample.yml"]
Expected Behavior
Deploying a job should result in the following message:
dbx deploy --deployment-file conf/deployment.yml --jobs test-dbx-sample
(base) vdi:~/git/test-dbx$ dbx deploy --deployment-file conf/deployment.yml --jobs test-dbx-sample
[dbx][2022-01-24 14:43:23.594] Starting new deployment for environment default
[dbx][2022-01-24 14:43:23.595] No environment variables provided, using the ~/.databrickscfg
[dbx][2022-01-24 14:43:24.530] Re-building package
[dbx][2022-01-24 14:43:25.362] Package re-build finished
[dbx][2022-01-24 14:43:25.362] Locating package file
[dbx][2022-01-24 14:43:25.362] Package file located in: dist/dbx_package-0.0.1-py3-none-any.whl
[dbx][2022-01-24 14:43:25.370] Requirements file is not provided
[dbx][2022-01-24 14:43:25.370] Deployment will be performed only for the following jobs: ['test-dbx-sample']
[dbx][2022-01-24 14:43:25.878] Deploying file: dbx_package/jobs/sample/entrypoint.py
[dbx][2022-01-24 14:43:26.769] Deploying file: conf/test/sample.yml
[dbx][2022-01-24 14:43:27.380] Deploying file: dist/dbx_package-0.0.1-py3-none-any.whl
[dbx][2022-01-24 14:43:27.798] Updating job definitions
[dbx][2022-01-24 14:43:27.799] Processing deployment for job: test-dbx-sample
[dbx][2022-01-24 14:43:27.849] Creating a new job with name test-dbx-sample
[dbx][2022-01-24 14:43:28.481] Updating job definitions - done
[dbx][2022-01-24 14:43:29.127] Deployment for environment default finished successfully ✨
Current Behavior
Now, if I deploy multiple jobs, that need the same conf file, or a job with multiple tasks needing the same conf file, then deployment fails with below error:
(base) vdi:~/git/test-dbx$ dbx deploy --deployment-file conf/deployment.yml
[dbx][2022-01-24 14:47:20.060] Starting new deployment for environment default
[dbx][2022-01-24 14:47:20.060] No environment variables provided, using the ~/.databrickscfg
[dbx][2022-01-24 14:47:21.080] Re-building package
[dbx][2022-01-24 14:47:21.896] Package re-build finished
[dbx][2022-01-24 14:47:21.896] Locating package file
[dbx][2022-01-24 14:47:21.896] Package file located in: dist/dbx_package-0.0.1-py3-none-any.whl
[dbx][2022-01-24 14:47:21.905] Requirements file is not provided
[dbx][2022-01-24 14:47:22.409] Deploying file: dbx_package/jobs/sample/entrypoint.py
[dbx][2022-01-24 14:47:23.367] Deploying file: conf/test/sample.yml
[dbx][2022-01-24 14:47:24.069] Deploying file: dist/dbx_package-0.0.1-py3-none-any.whl
[dbx][2022-01-24 14:47:24.594] Deploying file: tests/integration/sample_test.py
[dbx][2022-01-24 14:47:25.093] Deploying file: conf/test/sample.yml
[dbx][2022-01-24 14:47:26.255] Deploying file: conf/test/sample.yml
[dbx][2022-01-24 14:47:26.701] Deploying file: conf/test/sample.yml
Traceback (most recent call last):
File "/home/username/anaconda3/bin/dbx", line 8, in <module>
sys.exit(cli())
File "/home/username/anaconda3/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/home/username/anaconda3/lib/python3.8/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/home/username/anaconda3/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/username/anaconda3/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/username/anaconda3/lib/python3.8/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/home/username/anaconda3/lib/python3.8/site-packages/dbx/commands/deploy.py", line 174, in deploy
_adjust_job_definitions(
File "/home/username/anaconda3/lib/python3.8/site-packages/dbx/commands/deploy.py", line 313, in _adjust_job_definitions
_walk_content(adjustment_callback, job)
File "/home/username/anaconda3/lib/python3.8/site-packages/dbx/commands/deploy.py", line 499, in _walk_content
_walk_content(func, item, content, key)
File "/home/username/anaconda3/lib/python3.8/site-packages/dbx/commands/deploy.py", line 499, in _walk_content
_walk_content(func, item, content, key)
File "/home/username/anaconda3/lib/python3.8/site-packages/dbx/commands/deploy.py", line 502, in _walk_content
_walk_content(func, sub_item, content, idx)
File "/home/username/anaconda3/lib/python3.8/site-packages/dbx/commands/deploy.py", line 504, in _walk_content
parent[index] = func(content)
File "/home/username/anaconda3/lib/python3.8/site-packages/dbx/commands/deploy.py", line 307, in adjustment_callback
return _adjust_path(p, artifact_base_uri, file_uploader)
File "/home/username/anaconda3/lib/python3.8/site-packages/dbx/commands/deploy.py", line 569, in _adjust_path
adjusted_path = _strict_path_adjustment(candidate, adjustment, file_uploader)
File "/home/username/anaconda3/lib/python3.8/site-packages/dbx/commands/deploy.py", line 532, in _strict_path_adjustment
_upload_file(local_path, adjusted_path, file_uploader)
File "/home/username/anaconda3/lib/python3.8/site-packages/dbx/commands/deploy.py", line 511, in _upload_file
file_uploader.upload_file(local_path)
File "/home/username/anaconda3/lib/python3.8/site-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/home/username/anaconda3/lib/python3.8/site-packages/retry/api.py", line 73, in retry_decorator
return __retry_internal(partial(f, *args, **kwargs), exceptions, tries, delay, max_delay, backoff, jitter,
File "/home/username/anaconda3/lib/python3.8/site-packages/retry/api.py", line 33, in __retry_internal
return f()
File "/home/username/anaconda3/lib/python3.8/site-packages/dbx/utils/common.py", line 410, in upload_file
mlflow.log_artifact(str(file_path), str(posix_path.parent))
File "/home/username/anaconda3/lib/python3.8/site-packages/mlflow/tracking/fluent.py", line 605, in log_artifact
MlflowClient().log_artifact(run_id, local_path, artifact_path)
File "/home/username/anaconda3/lib/python3.8/site-packages/mlflow/tracking/client.py", line 955, in log_artifact
self._tracking_client.log_artifact(run_id, local_path, artifact_path)
File "/home/username/anaconda3/lib/python3.8/site-packages/mlflow/tracking/_tracking_service/client.py", line 355, in log_artifact
artifact_repo.log_artifact(local_path, artifact_path)
File "/home/username/anaconda3/lib/python3.8/site-packages/mlflow/store/artifact/dbfs_artifact_repo.py", line 119, in log_artifact
self._databricks_api_request(
File "/home/username/anaconda3/lib/python3.8/site-packages/mlflow/store/artifact/dbfs_artifact_repo.py", line 61, in _databricks_api_request
return http_request_safe(host_creds=host_creds, endpoint=endpoint, method=method, **kwargs)
File "/home/username/anaconda3/lib/python3.8/site-packages/mlflow/utils/rest_utils.py", line 162, in http_request_safe
return verify_rest_response(response, endpoint)
File "/home/username/anaconda3/lib/python3.8/site-packages/mlflow/utils/rest_utils.py", line 175, in verify_rest_response
raise MlflowException("%s. Response body: '%s'" % (base_msg, response.text))
mlflow.exceptions.MlflowException: API request to endpoint /dbfs/dbx/test-dbx/efb1d96bdba44aa2bf9129a681c1fa32/artifacts/conf/test/sample.yml failed with error code 409 != 200. Response body: '<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 409 </title>
</head>
<body>
<h2>HTTP ERROR: 409</h2>
<p>Problem accessing /dbfs/dbx/test-dbx/efb1d96bdba44aa2bf9129a681c1fa32/artifacts/conf/test/sample.yml. Reason:
<pre> File already exists, cannot overwrite: '/dbx/test-dbx/efb1d96bdba44aa2bf9129a681c1fa32/artifacts/conf/test/sample.yml'</pre></p>
<hr />
</body>
</html>
'
Steps to Reproduce (for bugs):
execute dbx deploy --deployment-file conf/deployment.yml with the above yaml
Or a yaml with multiple tasks in a job like this:
custom:
basic-cluster-props: &basic-cluster-props
spark_version: "9.1.x-cpu-ml-scala2.12"
basic-static-cluster: &basic-static-cluster
new_cluster:
<<: *basic-cluster-props
num_workers: 1
node_type_id: "i3.xlarge"
# please note that we're using FUSE reference for config file, hence we're going to load this file using its local FS path
environments:
default:
strict_path_adjustment_policy: true
jobs:
- name: "multiple-task-example"
tasks:
- task_key: "test-dbx-sample"
<<: *basic-static-cluster
spark_python_task:
python_file: "file://dbx_package/jobs/sample/entrypoint.py"
parameters: ["--conf-file", "file:fuse://conf/test/sample.yml"]
- task_key: "test-dbx-sample-integration-test"
<<: *basic-static-cluster
spark_python_task:
python_file: "file://tests/integration/sample_test.py"
parameters: ["--conf-file", "file:fuse://conf/test/sample.yml"]
depends_on:
- task_key: "test-dbx-sample"
Error:
(base) vdi:~/git/test-dbx$ dbx deploy --deployment-file conf/deployment.yml
[dbx][2022-01-24 15:00:54.997] Starting new deployment for environment default
[dbx][2022-01-24 15:00:54.998] No environment variables provided, using the ~/.databrickscfg
[dbx][2022-01-24 15:00:55.769] Re-building package
[dbx][2022-01-24 15:00:56.521] Package re-build finished
[dbx][2022-01-24 15:00:56.521] Locating package file
[dbx][2022-01-24 15:00:56.521] Package file located in: dist/dbx_package-0.0.1-py3-none-any.whl
[dbx][2022-01-24 15:00:56.531] Requirements file is not provided
[dbx][2022-01-24 15:00:57.112] Deploying file: dbx_package/jobs/sample/entrypoint.py
[dbx][2022-01-24 15:00:58.253] Deploying file: conf/test/sample.yml
[dbx][2022-01-24 15:00:58.825] Deploying file: tests/integration/sample_test.py
[dbx][2022-01-24 15:00:59.370] Deploying file: conf/test/sample.yml
[dbx][2022-01-24 15:01:00.550] Deploying file: conf/test/sample.yml
[dbx][2022-01-24 15:01:01.086] Deploying file: conf/test/sample.yml
Traceback (most recent call last):
File "/home/username/anaconda3/bin/dbx", line 8, in <module>
sys.exit(cli())
File "/home/username/anaconda3/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/home/username/anaconda3/lib/python3.8/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/home/username/anaconda3/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/username/anaconda3/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/username/anaconda3/lib/python3.8/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/home/username/anaconda3/lib/python3.8/site-packages/dbx/commands/deploy.py", line 174, in deploy
_adjust_job_definitions(
File "/home/username/anaconda3/lib/python3.8/site-packages/dbx/commands/deploy.py", line 313, in _adjust_job_definitions
_walk_content(adjustment_callback, job)
File "/home/username/anaconda3/lib/python3.8/site-packages/dbx/commands/deploy.py", line 499, in _walk_content
_walk_content(func, item, content, key)
File "/home/username/anaconda3/lib/python3.8/site-packages/dbx/commands/deploy.py", line 502, in _walk_content
_walk_content(func, sub_item, content, idx)
File "/home/username/anaconda3/lib/python3.8/site-packages/dbx/commands/deploy.py", line 499, in _walk_content
_walk_content(func, item, content, key)
File "/home/username/anaconda3/lib/python3.8/site-packages/dbx/commands/deploy.py", line 499, in _walk_content
_walk_content(func, item, content, key)
File "/home/username/anaconda3/lib/python3.8/site-packages/dbx/commands/deploy.py", line 502, in _walk_content
_walk_content(func, sub_item, content, idx)
File "/home/username/anaconda3/lib/python3.8/site-packages/dbx/commands/deploy.py", line 504, in _walk_content
parent[index] = func(content)
File "/home/username/anaconda3/lib/python3.8/site-packages/dbx/commands/deploy.py", line 307, in adjustment_callback
return _adjust_path(p, artifact_base_uri, file_uploader)
File "/home/username/anaconda3/lib/python3.8/site-packages/dbx/commands/deploy.py", line 569, in _adjust_path
adjusted_path = _strict_path_adjustment(candidate, adjustment, file_uploader)
File "/home/username/anaconda3/lib/python3.8/site-packages/dbx/commands/deploy.py", line 532, in _strict_path_adjustmentr
_upload_file(local_path, adjusted_path, file_uploader)
File "/home/username/anaconda3/lib/python3.8/site-packages/dbx/commands/deploy.py", line 511, in _upload_file
file_uploader.upload_file(local_path)
File "/home/username/anaconda3/lib/python3.8/site-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/home/username/anaconda3/lib/python3.8/site-packages/retry/api.py", line 73, in retry_decorator
return __retry_internal(partial(f, *args, **kwargs), exceptions, tries, delay, max_delay, backoff, jitter,
File "/home/username/anaconda3/lib/python3.8/site-packages/retry/api.py", line 33, in __retry_internal
return f()
File "/home/username/anaconda3/lib/python3.8/site-packages/dbx/utils/common.py", line 410, in upload_file
mlflow.log_artifact(str(file_path), str(posix_path.parent))
File "/home/username/anaconda3/lib/python3.8/site-packages/mlflow/tracking/fluent.py", line 605, in log_artifact
MlflowClient().log_artifact(run_id, local_path, artifact_path)
File "/home/username/anaconda3/lib/python3.8/site-packages/mlflow/tracking/client.py", line 955, in log_artifact
self._tracking_client.log_artifact(run_id, local_path, artifact_path)
File "/home/username/anaconda3/lib/python3.8/site-packages/mlflow/tracking/_tracking_service/client.py", line 355, in log_artifact
artifact_repo.log_artifact(local_path, artifact_path)
File "/home/username/anaconda3/lib/python3.8/site-packages/mlflow/store/artifact/dbfs_artifact_repo.py", line 119, in log_artifact
self._databricks_api_request(
File "/home/username/anaconda3/lib/python3.8/site-packages/mlflow/store/artifact/dbfs_artifact_repo.py", line 61, in _databricks_api_request
return http_request_safe(host_creds=host_creds, endpoint=endpoint, method=method, **kwargs)
File "/home/username/anaconda3/lib/python3.8/site-packages/mlflow/utils/rest_utils.py", line 162, in http_request_safe
return verify_rest_response(response, endpoint)
File "/home/username/anaconda3/lib/python3.8/site-packages/mlflow/utils/rest_utils.py", line 175, in verify_rest_response
raise MlflowException("%s. Response body: '%s'" % (base_msg, response.text))
mlflow.exceptions.MlflowException: API request to endpoint /dbfs/dbx/test-dbx/7b4271f8e5744df3b4fac82e7a679564/artifacts/conf/test/sample.yml failed with error code 409 != 200. Response body: '<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 409 </title>
</head>
<body>
<h2>HTTP ERROR: 409</h2>
<p>Problem accessing /dbfs/dbx/test-dbx/7b4271f8e5744df3b4fac82e7a679564/artifacts/conf/test/sample.yml. Reason:
<pre> File already exists, cannot overwrite: '/dbx/test-dbx/7b4271f8e5744df3b4fac82e7a679564/artifacts/conf/test/sample.yml'</pre></p>
<hr />
</body>
</html>
'
Context
I've developed a generic job that processes a table based on a command line param, then picks it's relevant info like source and target paths, etc from a conf file that is common for a data processing layer i.e. (1 for bronze, 1 for silver and 1 for gold.)
Since these are going to be multiple tasks running parallelly but sharing the same code, they share the same conf file in the task definitions:
spark_python_task:
python_file: "file://test_dbx/jobs/bronze/entrypoint.py"
parameters: ["--conf-file", "file:fuse://conf/dev/bronze.yml", "--table_name", "{{task_key}}"]
Your Environment
- dbx version used: 0.3.0
- Databricks Runtime version: 9.1.x-cpu-ml-scala2.12 (9.1)