GithubHelp home page GithubHelp logo

Engagement tracking about dbt-docs HOT 10 CLOSED

dbt-labs avatar dbt-labs commented on May 14, 2024 1
Engagement tracking

from dbt-docs.

Comments (10)

mferryRV avatar mferryRV commented on May 14, 2024 3

Thanks for creating this issue, @jtcohen6!

Context

If documentation is a differentiator for why DBT outclasses competitive solutions, then we need a way to validate that assertion in retrospect. Doing so with data is both possible and compelling.

From my perspective, I'd like a holistic view on how people are engaging with the data and information my team curates. For our stack, that's:

  • Audit logs on BigQuery tables
  • Usage logs for Tableau dashboards, and
  • Page view logs on documentation

If I see that people are building dashboards on a table but not reading its documentation, then either they're not familiar with the docs or the docs are not valuable. That's useful feedback that I can glean without interrupting my users. It's also actionable because it helps us decide how much time we should spend documenting things.

Requirements

To meet that use case, I would ideally like page view data from all documentation repositories (e.g. Confluence, dbt docs, Notion) loaded into my data warehouse. I could then marry this with the query audit logs and Tableau usage data for the same users to create a holistic picture of their interactions with data.

The data I'm interested in for dbt docs is primarily for the DBT Cloud use-case:

  • Logged-in user IDs (that maps to email addresses or an SSO identity) on every event
  • Page view events including full URL
  • DAG views, ideally including the command that was used, e.g. +my_model+, any tag filters, and the node on which the user was focused

Nice to have:

  • Click events to expand column documentation
  • Scroll events when each section enters the viewport (Details, Description, Columns, SQL)

This would be plenty of info for me to see who is aware of the documentation and who is really using the documentation.

Implementation

I'm a fan of Google Tag Manager for decoupling release cycles between marketing analytics and product features. My inclination is to use this as a solution for managing page view tracking.

To enable the event tracking above, you would push events to the dataLayer variable any time something interesting happened, and it would be simple to create triggers in GTM.

But loading a GTM container gives the opportunity for arbitrary JS injection, so there are a few possibilities:

  1. Free-for-all Docs functionality could be modified to provide a configuration option for the GTM container ID to be loaded on the page. We could create an example GTM container with all the variables & triggers pre-built. Users would then import that example container and add their own tags. To guide implementation of those tags, it's possible to create tag templates that simplify configuration of otherwise custom HTML tags. These are very useful for things like the Facebook pixel, but would also be a great solution for guiding particular tracking implementations to work with the existing variables and triggers.
  2. Managed Containers For DBT Cloud users, provide a GTM container ID and grant user access to clients. Releases could be approved by the Fishtown team. Clients would have edit access to propose changes.
  3. Universal Container Allow configuration of a universally applicable GTM container. Fishtown (or the broader community) would maintain a GTM container with tags that send data to your favourite trackers (GA, Segment, Snowplow). Users would supply a destination for pre-built tracking tags through configuration options in the DBT repo. That configuration would populate a GTM variable that configures the tags to send to the right places. Because the doc structure and tracking is all standardised, every docs instance can use the same container.

Personally, I think option 3 is a pretty cool concept. Consistent with the open source ideology and flexible enough to support a variety of users. Happy to help if we decide to go that route!

Implications

If this data is tracked and loaded in a consistent way for all DBT docs users, it should be possible to create plug-and-play solutions for basic analytics (i.e. a dbt package and a Looker/Tableau/Data Studio dashboard). These could serve as both useful tools for supporting a business and good demonstrations of how DBT works in practice. Additionally, for users (like me) that want to marry this with other data, loading it into a Data Warehouse enables that while following an ELT approach.

from dbt-docs.

drewbanin avatar drewbanin commented on May 14, 2024 2

I just wanted to pick this thread back up. I've been mulling over some implementation considerations for a little while now. I think the most compelling version of this for users of the docs site would be to allow javascript snippets (eg. a GA tracking pixel, or Snowplow tracking code, or a GTM import...) in the docs site.

I think that's something we can't readily (or really, don't want to) do. The big issue here is that we run the dbt Docs website in dbt Cloud, and it's a terribly bad idea to allow folks to write custom JS that runs for other users inside of the dbt Cloud application. You could, for instance, make requests to authenticated endpoints on behalf of another user if we allowed arbitrary JS snippets. We also know that some folks are running, or plan to run, the docs site in other hosted applications, so this isn't specifically a dbt Cloud constraint -- it's more that it limits the appropriate deployment models for this site too greatly.

I think the next best thing would be to allow the configuration of the docs website with either:

  • a GA tracking ID
  • A snowplow collector URL + associated configs (just an app_id, i think)
  • I could also imagine adding support for something like Segment too, but we're less familiar with Segment tracking setups, so maybe we should consider that for a v2 of this feature

I'd be in favor of prioritizing a change like this for the v0.18.0 (Marian Anderson) release of dbt. We'd likely want to start with a GA integration that fires pageview events (and maybe usage information, like viewing the DAG) to a configured GA account.

From there, we can expand support out for other tools like Segment and Snowplow if folks ask for it!

Y'all buy that?

from dbt-docs.

drewbanin avatar drewbanin commented on May 14, 2024

@jtcohen6 these are good questions! I've seen systems support configuration for GA (and similar) tracking by accepting a tracking ID and auto-tracking events. I bet we could do something like that, but I of course know there are some Snowplow shops using dbt in the world too :)

I don't think we're going to want to allow arbitrary JS injection -- I can see that being pretty brittle. If you know folks that are interested in adding tracking to the docs, I'd love to hear what they think about this here!

from dbt-docs.

drewbanin avatar drewbanin commented on May 14, 2024

Thanks for this really thorough writeup @mferryRV! I really buy what you're saying here.

I think that of the three options you outlined, the first one might be the most tractable for us. I can definitely imagine the docs site pushing events onto a window variable in a structured (and documented) way, then allowing users to slurp up those events however they see fit.

I too really like GTM, but I'm not certain that it's appropriate for every org using dbt docs out in the wild. A solution that lets folks leverage GTM with minimal effort without requiring it feels like a good hybrid approach to me.

I like that you touched on the user id component in dbt Cloud. We definitely do have user ids that we can expose in these events, and there are good ways to map these user ids onto identities of dbt Cloud users.

The following things are super easy to track:

  • pageviews
  • DAG views (with filters)
  • Column detail expansion
  • search queries

I think scroll depth is comparatively harder to implement. Something like Snowplow implements this out of the box, but i shudder to think about calculating scroll depth in a cross-browser way ourselves. If you're able to share, can you tell me which system you'd likely use to record these events?

from dbt-docs.

nehiljain avatar nehiljain commented on May 14, 2024

We are really interested in the feature as well but most of the requirements are already discussed in the above comments @drewbanin so just +1 from me.

from dbt-docs.

mferryRV avatar mferryRV commented on May 14, 2024

Thanks for reminding me about this, @nehiljain!

@drewbanin - we would be comfortable using either Segment or GA to track these events.

from dbt-docs.

github-actions avatar github-actions commented on May 14, 2024

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

from dbt-docs.

joellabes avatar joellabes commented on May 14, 2024

Reopening based on more interest in https://discourse.getdbt.com/t/is-user-tracking-possible-in-dbt-docs/6148

from dbt-docs.

github-actions avatar github-actions commented on May 14, 2024

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

from dbt-docs.

github-actions avatar github-actions commented on May 14, 2024

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

from dbt-docs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.