GithubHelp home page GithubHelp logo

Comments (7)

adamribaudo-velir avatar adamribaudo-velir commented on September 1, 2024 1

@riordan I was able to make faster progress on this than I had imagined. Can you try using this branch? https://github.com/Velir/dbt-ga4/tree/union-multiple-bq-exports

The only change you need to make is to replace your dataset variable with datasets: ['dataset1','dataset2'] like this:

datasets: ['analytics_320375000','analytics_237690264']

Your datasets will be unioned, but I confirmed that the incremental materialization still works so performance should be fine after the first --full-refresh. A new column will be available in your models called ga4_dataset which will hold the name of the original dataset.

The (!!)main issue(!!) with this approach is as follows: there's no way (I know of) to dynamically generate sources. I've removed the src_ga4.yml file for now because it won't be correct. DBT is being told to query the datasets listed as vars rather than calling source('name','table'). This breaks your lineage graph because DBT won't know what source originated this model.

That is the same issue with the fivetran_utils method, so clearly their engineers have decided it's worth the risk. But I'd love feedback on the functionality and this issue.

Also FYI that I removed support for the intraday tables for now. That needs to be rethought.

from dbt-ga4.

adamribaudo-velir avatar adamribaudo-velir commented on September 1, 2024 1

I'm coming up short in how to dynamically create multiple DBT sources from a project variable. The key issue is that Jinja can only be used to insert strings into YAML files, but to create sources I would need to insert new objects. Some discussion about this here: https://getdbt.slack.com/archives/C2JRRQDTL/p1656759862173199?thread_ts=1656004021.576959&cid=C2JRRQDTL

from dbt-ga4.

adamribaudo-velir avatar adamribaudo-velir commented on September 1, 2024

Thanks. One option I'm exploring is the fivetran_utils.union_data method of unioning together data from multiple sources. The potential issue there is related to performance. That method doesn't take into account any sharding/partitioning and it may be unwise to union multiple large event datasets.

The fundamental problem is that DBT lacks clear support for namespaces, so there's no native way to load in a package 3 times under 3 namespaces and treat the resulting models as unique. Model names can collide easily in DBT.

So just a heads up that this is on my mind, but it may not be feasible currently. My understanding is that additional support for namespaces is on the DBT roadmap for later this year.

from dbt-ga4.

riordan avatar riordan commented on September 1, 2024

This is incredible! I'll take it for a spin this weekend!

from dbt-ga4.

dgitis avatar dgitis commented on September 1, 2024

If you want to combine multiple sites, maybe it is better to over-ride the package's base_ga4__events..

You'd disable the packages default models with this code in your dbt_project.yml:

models:
  ga4:
    staging:
      ga4:
        base:
          base_ga4__events:
            +enabled: false

Then copy the disabled base_ga4__events.sql package file into a file of the same name in your project.

Duplicate the source and renamed blocks in that file and configure the copies to get their data from the second GA4 property.

Then, in the base_ga4__events file, replace the last select statement with a union all statement.

from dbt-ga4.

adamribaudo-velir avatar adamribaudo-velir commented on September 1, 2024

@dgitis that's a nice work-around, just keep in mind that anyone manually copying models from the package won't automatically receive updates to those models as the package is updated. And things are changing fast given how early we are in development.

So just something to consider.

from dbt-ga4.

adamribaudo-velir avatar adamribaudo-velir commented on September 1, 2024

Going to close this for now. There's some discussion in DBT slack requesting Jinja in YAML files, but it doesn't seem like it's going to come any time soon. Just a limitation of DBT right now, I'm afraid. @dgitis suggested a manual work-around that should work when the benefits outweigh the effort in keeping the copied package code up to date.

from dbt-ga4.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.