GithubHelp home page GithubHelp logo

Comments (9)

adamribaudo-velir avatar adamribaudo-velir commented on September 1, 2024

Seems to me like the issue is with the test - at the very least it should include stream_id. @dgitis ?

I'd like to hear more about why having a page_location set to / is a problem, @erikverheij . Can you expand on that, ignoring the fct_ga4__pages model for a minute?

from dbt-ga4.

erikverheij avatar erikverheij commented on September 1, 2024

Hi @adamribaudo-velir, thanks for your quick reply.

Usually, the page_location contains a full domain name. I'm not sure how these entries entered our GA4 account.

from dbt-ga4.

adamribaudo-velir avatar adamribaudo-velir commented on September 1, 2024

@erikverheij oh ok. For some reason I thought page_location didn't include the hostname. It sounds like there's an issue with your data collection then, yes? Maybe that field is being overwritten with page path in GTM or something?

Regardless, the faulty data has already been collected so I think your best bet is to set the severity of the error you noticed to warn so that it doesn't block your dbt build https://docs.getdbt.com/reference/resource-configs/severity

from dbt-ga4.

dgitis avatar dgitis commented on September 1, 2024

We had an issue with multi-site related to this that was recently fixed by @yamotech.

That test should be column_name: "(event_date_dt || stream_id || page_location)". I think that was before the last release so updating your packages.yml file to pull from Git will work until a new release comes out.

packages:
  - git: "https://github.com/Velir/dbt-ga4.git"

However, that would not cause the page_location to be a /.

This looks to me to be a data collection problem like maybe someone decided to override the page_location with the page path thus stripping the domain.

from dbt-ga4.

erikverheij avatar erikverheij commented on September 1, 2024

Hi @dgitis, thx for the suggestion.

This looks to me to be a data collection problem like maybe someone decided to override the page_location with the page path thus stripping the domain.

Looks something like that indeed.. Is there a way to hook into the flow somewhere at the start or even before the start to remove these entries before transforming with this package?

from dbt-ga4.

dgitis avatar dgitis commented on September 1, 2024

The failure is at fct_ga4__pages.

The package is designed for transformation that happen to all events get put in stg_ga4__events and event-specific transformations happen in the corresponding stg_ga4__event_* model.

What you should do is override the stg_ga4__events model and slip in a step that adds the hostname back in to page_location so that it doesn't error downstream.

To do this, you'll need to create a stg_ga4__events model in your project models folder (anywhere is fine, but I recommend using folders for organization), copy over the contents of the package stg_ga4__events file and modify the code to fix page_location before that field gets processed in other ways.

Then you'll need to disable the package version in your dbt_project.yml file's models section like this:

models:
  ga4:
    staging:
      stg_ga4__events:
        +enabled: false

That code was done from memory, so it may not be exactly correct.

from dbt-ga4.

adamribaudo-velir avatar adamribaudo-velir commented on September 1, 2024

Just an idea @erikverheij , but you could override the 'base select' macro to insert your own logic for parsing page_location rather than overriding the stg_ga4__events model https://github.com/Velir/dbt-ga4/blob/main/macros/base_select.sql

Only slightly cleaner, but thought I'd throw it out there.

from dbt-ga4.

erikverheij avatar erikverheij commented on September 1, 2024

Thanks guys for providing some ways to workaround it! That will do the trick for me for now.

It would be very nice if there's a way to do some filtering before everything starts, without the need to override logic from the package. I'm not experienced in DBT, but perhaps something like this would be possible?:

Adding a placeholder macro with a WHERE clause in base_ga4__events that contains no logic in itself but can be overwritten to filter out some specific events.

Note, that this is only a suggestion for the nice-to-have list as the workaround works for me. Such a feature would make it easier to stay aligned with the logic from this repo.

from dbt-ga4.

dgitis avatar dgitis commented on September 1, 2024

I've definitely thought about adding a custom SQL variable that, if set, adds a with custom_sql as ( {{ var('model_name_custom_sql') }}) block or, more likely some sort of {{ if macro_exists('model_name_custom_sql') }} logic, but I feel like both of those options are more complex than overriding the existing macros/models using dbt-native mechanisms.

from dbt-ga4.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.