Comments (12)
☝️ This is with Iceberg tables in AWS S3, and using the AWS Glue catalog. The it tries to write _dbt_tmp
to the same s3 location as the final table.
from dbt-trino.
I am using Hive Metastore and have encountered the same issue with S3 storage on MinIO,how to solve this?
from dbt-trino.
Can you provide more details?
I got so far:
Catalog type = Iceberg
Metastore = Glue or Hive Metastore
On which platform (Trino, Galaxy, SEP?), which versions and what catalog properties are set.
Which dbt-trino version are you using?
from dbt-trino.
platform = Trino
Trino version = 425 (upgrading to 426 shortly)
Catalog properties = nothing set (so default?)
Running with dbt=1.5.2
Registered adapter: trino=1.5.0
Please let me know if you have any other questions
from dbt-trino.
@mx-dwolff Can you show exact error log? And can you also show snapshot model configuration?
from dbt-trino.
error log:
20:27:41 Database Error in snapshot accounts_snapshot (snapshots\accounts_snapshot.sql)
20:27:41 TrinoExternalError(type=EXTERNAL, name=ICEBERG_FILESYSTEM_ERROR, message="Cannot create a table on a non-empty location: s3://bucket_location/iceberg/mgp/protected/accounts_
snapshot, set 'iceberg.unique-table-location=true' in your Iceberg catalog properties to use unique table locations for every table.", query_id=20230927_202740_01958_28gf4)
20:27:41
20:27:41 Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1
from dbt-trino.
model config:
{{
config(
materialized = 'snapshot',
on_table_exists = 'drop',
unique_key = 'account_number',
strategy = 'timestamp',
updated_at = 'derv_updated_at',
properties={
"format" : "'PARQUET'" ,
"format_version" : "2" ,
"location" : "'s3://bucket_location/iceberg/mgp/protected/accounts_snapshot/'"
}
)
}}
from dbt-trino.
FYI -- "bucket_location" was my edit in place of actual bucket name
from dbt-trino.
@mx-dwolff Currently, snapshots do not work correctly when specifying location
property. This issue arises because the snapshot model is initially created in a specified location, and on subsequent runs of the dbt snapshot
command, temp table is attempted to be created in the same location, resulting in an error.
When the location
table property is omitted, the content of the table is stored in a subdirectory under the directory corresponding to the schema location (docs on that).
Therefore, omitting location property would be an immiediate solution.
So, is there a specific reason why you are explicitly specifying the table location? Wouldn't default location (subdirectory in schema location) work for your case?
from dbt-trino.
@damian3031 Thanks for this info! I will give that a shot and follow up if errors continue.
I do still find it a bit odd that other dbt operations that utilize a similar approach -- such as an incremental model that uses a merge strategy -- can create temporary views (instead of tables) that avoid this problem altogether. Is there a particular reason an incremental model can utilize a temporary view whereas the snapshots require a temporary table? It's not an absolute necessity to specify a location property, however it helps provide greater clarity and control into where the data is being stored.
from dbt-trino.
@mx-dwolff Using a view puts us at risk of losing track of changes. It's because in a view the columns are static while the data is dynamic. For example, if the table schema is changed during the snapshotting, we could have changes getting merged into the snapshot table that doesn't contain the values of newly added columns after the creation of the snapshot view.
If the snapshot uses a last modified timestamp, any values for added columns since creating the view won't be inserted in the snapshot table. Next time, they will be ignored since the max modified timestamp in the snapshot table will think it has already processed those values.
Because of the above, we can't use views in snapshot materialization.
One potential solution could be to create a schema with a specific location first, by adding below config in dbt_project.yml
:
on-run-start: "create schema if not exists snapshots_schema with (location = 's3://datalake/iceberg/mgp/protected/accounts_snapshot')"
removing location
, and adding target_schema='snapshots_schema'
property to model configuration.
This way, schema would be created in the specified location, and tables would be created in subdirectories within the schema location. Temporary table will also be created in a subdirectory, so it won't interfere with the snapshot table.
It may be a bit cumbersome to specify it in on-run-start
config, as it will be executed at the beginning of every dbt command, but it will work.
There is some discussion about configuring and managing schemas in similar way to models, which would be the right way to do it: dbt-labs/dbt-core#5781
from dbt-trino.
Currently there is no easy way to support location property for snapshot models in dbt-trino.
As mentioned, solution would be to remove that property.
Since version 1.7.1, dbt-trino raises an explicit error about not supporting this comibnation.
from dbt-trino.
Related Issues (20)
- Failing quote policy tests on galaxy HOT 3
- Add retry mechanism for the given error HOT 1
- Incremental model with on_schema_change='sync_all_columns' does not work with data type changes HOT 1
- Change full-refresh behaviour to drop target table after successful run HOT 2
- Include query_id in run results for seed HOT 1
- Invalid SQL generated for dbt_project_evaluator HOT 1
- dbt-trino adds comment into table create statement by default HOT 2
- Add tests for aliases
- Add copyright notices to files HOT 1
- Support elementary in improving data-quality capabilities HOT 3
- Extend Hive test coverage
- upgrade to support dbt-core v1.7.0 HOT 1
- Incorrect Schema Used When Renaming Materialized Views HOT 4
- Support CASCADE dropping relations
- Solving for large stage depths HOT 3
- Support `CREATE OR REPLACE` HOT 2
- get_relation not working as already_exists HOT 4
- deltalake rename managed table not allowed arised HOT 1
- Failed to connect to Trino cluster using LDAP auth and HTTP connection HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dbt-trino.