GithubHelp home page GithubHelp logo

Comments (12)

jusdino avatar jusdino commented on July 18, 2024 1

☝️ This is with Iceberg tables in AWS S3, and using the AWS Glue catalog. The it tries to write _dbt_tmp to the same s3 location as the final table.

from dbt-trino.

Firstero avatar Firstero commented on July 18, 2024

I am using Hive Metastore and have encountered the same issue with S3 storage on MinIO,how to solve this?

from dbt-trino.

mdesmet avatar mdesmet commented on July 18, 2024

Can you provide more details?

I got so far:

Catalog type = Iceberg
Metastore = Glue or Hive Metastore

On which platform (Trino, Galaxy, SEP?), which versions and what catalog properties are set.

Which dbt-trino version are you using?

from dbt-trino.

mx-dwolff avatar mx-dwolff commented on July 18, 2024

platform = Trino
Trino version = 425 (upgrading to 426 shortly)
Catalog properties = nothing set (so default?)
Running with dbt=1.5.2
Registered adapter: trino=1.5.0

Please let me know if you have any other questions

from dbt-trino.

damian3031 avatar damian3031 commented on July 18, 2024

@mx-dwolff Can you show exact error log? And can you also show snapshot model configuration?

from dbt-trino.

mx-dwolff avatar mx-dwolff commented on July 18, 2024

error log:

20:27:41 Database Error in snapshot accounts_snapshot (snapshots\accounts_snapshot.sql)
20:27:41 TrinoExternalError(type=EXTERNAL, name=ICEBERG_FILESYSTEM_ERROR, message="Cannot create a table on a non-empty location: s3://bucket_location/iceberg/mgp/protected/accounts_
snapshot, set 'iceberg.unique-table-location=true' in your Iceberg catalog properties to use unique table locations for every table.", query_id=20230927_202740_01958_28gf4)
20:27:41
20:27:41 Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

from dbt-trino.

mx-dwolff avatar mx-dwolff commented on July 18, 2024

model config:

{{
config(
      materialized = 'snapshot',
      on_table_exists = 'drop',
      unique_key = 'account_number',
      strategy = 'timestamp',
      updated_at = 'derv_updated_at',
      properties={
      "format" : "'PARQUET'" ,
      "format_version" : "2" ,
      "location" : "'s3://bucket_location/iceberg/mgp/protected/accounts_snapshot/'"
      }
)
}}

from dbt-trino.

mx-dwolff avatar mx-dwolff commented on July 18, 2024

FYI -- "bucket_location" was my edit in place of actual bucket name

from dbt-trino.

damian3031 avatar damian3031 commented on July 18, 2024

@mx-dwolff Currently, snapshots do not work correctly when specifying location property. This issue arises because the snapshot model is initially created in a specified location, and on subsequent runs of the dbt snapshot command, temp table is attempted to be created in the same location, resulting in an error.

When the location table property is omitted, the content of the table is stored in a subdirectory under the directory corresponding to the schema location (docs on that).
Therefore, omitting location property would be an immiediate solution.

So, is there a specific reason why you are explicitly specifying the table location? Wouldn't default location (subdirectory in schema location) work for your case?

from dbt-trino.

mx-dwolff avatar mx-dwolff commented on July 18, 2024

@damian3031 Thanks for this info! I will give that a shot and follow up if errors continue.

I do still find it a bit odd that other dbt operations that utilize a similar approach -- such as an incremental model that uses a merge strategy -- can create temporary views (instead of tables) that avoid this problem altogether. Is there a particular reason an incremental model can utilize a temporary view whereas the snapshots require a temporary table? It's not an absolute necessity to specify a location property, however it helps provide greater clarity and control into where the data is being stored.

from dbt-trino.

damian3031 avatar damian3031 commented on July 18, 2024

@mx-dwolff Using a view puts us at risk of losing track of changes. It's because in a view the columns are static while the data is dynamic. For example, if the table schema is changed during the snapshotting, we could have changes getting merged into the snapshot table that doesn't contain the values of newly added columns after the creation of the snapshot view.
If the snapshot uses a last modified timestamp, any values for added columns since creating the view won't be inserted in the snapshot table. Next time, they will be ignored since the max modified timestamp in the snapshot table will think it has already processed those values.

Because of the above, we can't use views in snapshot materialization.

One potential solution could be to create a schema with a specific location first, by adding below config in dbt_project.yml:

on-run-start: "create schema if not exists snapshots_schema with (location = 's3://datalake/iceberg/mgp/protected/accounts_snapshot')"

removing location, and adding target_schema='snapshots_schema' property to model configuration.
This way, schema would be created in the specified location, and tables would be created in subdirectories within the schema location. Temporary table will also be created in a subdirectory, so it won't interfere with the snapshot table.

It may be a bit cumbersome to specify it in on-run-start config, as it will be executed at the beginning of every dbt command, but it will work.

There is some discussion about configuring and managing schemas in similar way to models, which would be the right way to do it: dbt-labs/dbt-core#5781

from dbt-trino.

damian3031 avatar damian3031 commented on July 18, 2024

Currently there is no easy way to support location property for snapshot models in dbt-trino.
As mentioned, solution would be to remove that property.
Since version 1.7.1, dbt-trino raises an explicit error about not supporting this comibnation.

from dbt-trino.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.