Comments (7)
@riordan I was able to make faster progress on this than I had imagined. Can you try using this branch? https://github.com/Velir/dbt-ga4/tree/union-multiple-bq-exports
The only change you need to make is to replace your dataset
variable with datasets: ['dataset1','dataset2']
like this:
datasets: ['analytics_320375000','analytics_237690264']
Your datasets will be unioned, but I confirmed that the incremental materialization still works so performance should be fine after the first --full-refresh
. A new column will be available in your models called ga4_dataset
which will hold the name of the original dataset.
The (!!)main issue(!!) with this approach is as follows: there's no way (I know of) to dynamically generate sources. I've removed the src_ga4.yml
file for now because it won't be correct. DBT is being told to query the datasets listed as vars rather than calling source('name','table')
. This breaks your lineage graph because DBT won't know what source originated this model.
That is the same issue with the fivetran_utils
method, so clearly their engineers have decided it's worth the risk. But I'd love feedback on the functionality and this issue.
Also FYI that I removed support for the intraday
tables for now. That needs to be rethought.
from dbt-ga4.
I'm coming up short in how to dynamically create multiple DBT sources from a project variable. The key issue is that Jinja can only be used to insert strings into YAML files, but to create sources I would need to insert new objects. Some discussion about this here: https://getdbt.slack.com/archives/C2JRRQDTL/p1656759862173199?thread_ts=1656004021.576959&cid=C2JRRQDTL
from dbt-ga4.
Thanks. One option I'm exploring is the fivetran_utils.union_data
method of unioning together data from multiple sources. The potential issue there is related to performance. That method doesn't take into account any sharding/partitioning and it may be unwise to union multiple large event datasets.
The fundamental problem is that DBT lacks clear support for namespaces, so there's no native way to load in a package 3 times under 3 namespaces and treat the resulting models as unique. Model names can collide easily in DBT.
So just a heads up that this is on my mind, but it may not be feasible currently. My understanding is that additional support for namespaces is on the DBT roadmap for later this year.
from dbt-ga4.
This is incredible! I'll take it for a spin this weekend!
from dbt-ga4.
If you want to combine multiple sites, maybe it is better to over-ride the package's base_ga4__events..
You'd disable the packages default models with this code in your dbt_project.yml:
models:
ga4:
staging:
ga4:
base:
base_ga4__events:
+enabled: false
Then copy the disabled base_ga4__events.sql package file into a file of the same name in your project.
Duplicate the source and renamed blocks in that file and configure the copies to get their data from the second GA4 property.
Then, in the base_ga4__events file, replace the last select statement with a union all statement.
from dbt-ga4.
@dgitis that's a nice work-around, just keep in mind that anyone manually copying models from the package won't automatically receive updates to those models as the package is updated. And things are changing fast given how early we are in development.
So just something to consider.
from dbt-ga4.
Going to close this for now. There's some discussion in DBT slack requesting Jinja in YAML files, but it doesn't seem like it's going to come any time soon. Just a limitation of DBT right now, I'm afraid. @dgitis suggested a manual work-around that should work when the benefits outweigh the effort in keeping the copied package code up to date.
from dbt-ga4.
Related Issues (20)
- Set "Direct" instead "(none)" for last_non_direct_default_channel_grouping HOT 1
- GA4 raw data format can break google / organic gclid fix HOT 5
- Accommodate Firebase App data streams HOT 2
- Multi-project support on BigQuery error HOT 3
- Screen Resolution HOT 4
- Workaround for Malformed Events with page_location = '/' in fct_ga4__pages HOT 9
- Error in the base_ga4_events model HOT 2
- Update to gclid traffic attribution is missing evaluation for the campaign name "(organic)" HOT 1
- multi-property option to analyze properties seperately HOT 5
- Add specified user properties to event models
- dbt 1.8 has deprecated `tests` in favour of `data_tests` HOT 4
- Differences between GA4 Reports and BigQuery reports - # of Engaged Sessions HOT 2
- Page engagement time refinement / edge cases HOT 1
- Google Discover Source and Channel Attribution HOT 3
- Support Export Change: Session traffic source information now available in the GA4 BigQuery Export HOT 2
- Update Google Ads google / organic fix.
- Support New Batch Fields/Review Impact on Existing Code
- How to recreate GA4 reports using this schema? HOT 7
- The page_location_with_gclid_is_cpc fails when using UTMs for attribution
- Handling null ga_session_id in event_key creation HOT 16
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dbt-ga4.