Comments (10)
Thanks for creating this issue, @jtcohen6!
Context
If documentation is a differentiator for why DBT outclasses competitive solutions, then we need a way to validate that assertion in retrospect. Doing so with data is both possible and compelling.
From my perspective, I'd like a holistic view on how people are engaging with the data and information my team curates. For our stack, that's:
- Audit logs on BigQuery tables
- Usage logs for Tableau dashboards, and
- Page view logs on documentation
If I see that people are building dashboards on a table but not reading its documentation, then either they're not familiar with the docs or the docs are not valuable. That's useful feedback that I can glean without interrupting my users. It's also actionable because it helps us decide how much time we should spend documenting things.
Requirements
To meet that use case, I would ideally like page view data from all documentation repositories (e.g. Confluence, dbt docs, Notion) loaded into my data warehouse. I could then marry this with the query audit logs and Tableau usage data for the same users to create a holistic picture of their interactions with data.
The data I'm interested in for dbt docs is primarily for the DBT Cloud use-case:
- Logged-in user IDs (that maps to email addresses or an SSO identity) on every event
- Page view events including full URL
- DAG views, ideally including the command that was used, e.g. +my_model+, any tag filters, and the node on which the user was focused
Nice to have:
- Click events to expand column documentation
- Scroll events when each section enters the viewport (Details, Description, Columns, SQL)
This would be plenty of info for me to see who is aware of the documentation and who is really using the documentation.
Implementation
I'm a fan of Google Tag Manager for decoupling release cycles between marketing analytics and product features. My inclination is to use this as a solution for managing page view tracking.
To enable the event tracking above, you would push events to the dataLayer
variable any time something interesting happened, and it would be simple to create triggers in GTM.
But loading a GTM container gives the opportunity for arbitrary JS injection, so there are a few possibilities:
- Free-for-all Docs functionality could be modified to provide a configuration option for the GTM container ID to be loaded on the page. We could create an example GTM container with all the variables & triggers pre-built. Users would then import that example container and add their own tags. To guide implementation of those tags, it's possible to create tag templates that simplify configuration of otherwise custom HTML tags. These are very useful for things like the Facebook pixel, but would also be a great solution for guiding particular tracking implementations to work with the existing variables and triggers.
- Managed Containers For DBT Cloud users, provide a GTM container ID and grant user access to clients. Releases could be approved by the Fishtown team. Clients would have edit access to propose changes.
- Universal Container Allow configuration of a universally applicable GTM container. Fishtown (or the broader community) would maintain a GTM container with tags that send data to your favourite trackers (GA, Segment, Snowplow). Users would supply a destination for pre-built tracking tags through configuration options in the DBT repo. That configuration would populate a GTM variable that configures the tags to send to the right places. Because the doc structure and tracking is all standardised, every docs instance can use the same container.
Personally, I think option 3 is a pretty cool concept. Consistent with the open source ideology and flexible enough to support a variety of users. Happy to help if we decide to go that route!
Implications
If this data is tracked and loaded in a consistent way for all DBT docs users, it should be possible to create plug-and-play solutions for basic analytics (i.e. a dbt package and a Looker/Tableau/Data Studio dashboard). These could serve as both useful tools for supporting a business and good demonstrations of how DBT works in practice. Additionally, for users (like me) that want to marry this with other data, loading it into a Data Warehouse enables that while following an ELT approach.
from dbt-docs.
I just wanted to pick this thread back up. I've been mulling over some implementation considerations for a little while now. I think the most compelling version of this for users of the docs site would be to allow javascript snippets (eg. a GA tracking pixel, or Snowplow tracking code, or a GTM import...) in the docs site.
I think that's something we can't readily (or really, don't want to) do. The big issue here is that we run the dbt Docs website in dbt Cloud, and it's a terribly bad idea to allow folks to write custom JS that runs for other users inside of the dbt Cloud application. You could, for instance, make requests to authenticated endpoints on behalf of another user if we allowed arbitrary JS snippets. We also know that some folks are running, or plan to run, the docs site in other hosted applications, so this isn't specifically a dbt Cloud constraint -- it's more that it limits the appropriate deployment models for this site too greatly.
I think the next best thing would be to allow the configuration of the docs website with either:
- a GA tracking ID
- A snowplow collector URL + associated configs (just an app_id, i think)
- I could also imagine adding support for something like Segment too, but we're less familiar with Segment tracking setups, so maybe we should consider that for a v2 of this feature
I'd be in favor of prioritizing a change like this for the v0.18.0 (Marian Anderson) release of dbt. We'd likely want to start with a GA integration that fires pageview events (and maybe usage information, like viewing the DAG) to a configured GA account.
From there, we can expand support out for other tools like Segment and Snowplow if folks ask for it!
Y'all buy that?
from dbt-docs.
@jtcohen6 these are good questions! I've seen systems support configuration for GA (and similar) tracking by accepting a tracking ID and auto-tracking events. I bet we could do something like that, but I of course know there are some Snowplow shops using dbt in the world too :)
I don't think we're going to want to allow arbitrary JS injection -- I can see that being pretty brittle. If you know folks that are interested in adding tracking to the docs, I'd love to hear what they think about this here!
from dbt-docs.
Thanks for this really thorough writeup @mferryRV! I really buy what you're saying here.
I think that of the three options you outlined, the first one might be the most tractable for us. I can definitely imagine the docs site pushing events onto a window variable in a structured (and documented) way, then allowing users to slurp up those events however they see fit.
I too really like GTM, but I'm not certain that it's appropriate for every org using dbt docs out in the wild. A solution that lets folks leverage GTM with minimal effort without requiring it feels like a good hybrid approach to me.
I like that you touched on the user id component in dbt Cloud. We definitely do have user ids that we can expose in these events, and there are good ways to map these user ids onto identities of dbt Cloud users.
The following things are super easy to track:
- pageviews
- DAG views (with filters)
- Column detail expansion
- search queries
I think scroll depth is comparatively harder to implement. Something like Snowplow implements this out of the box, but i shudder to think about calculating scroll depth in a cross-browser way ourselves. If you're able to share, can you tell me which system you'd likely use to record these events?
from dbt-docs.
We are really interested in the feature as well but most of the requirements are already discussed in the above comments @drewbanin so just +1 from me.
from dbt-docs.
Thanks for reminding me about this, @nehiljain!
@drewbanin - we would be comfortable using either Segment or GA to track these events.
from dbt-docs.
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.
from dbt-docs.
Reopening based on more interest in https://discourse.getdbt.com/t/is-user-tracking-possible-in-dbt-docs/6148
from dbt-docs.
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.
from dbt-docs.
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.
from dbt-docs.
Related Issues (20)
- [CT-3060] Testing issue opening from another workflow
- [CT-3061] Testing issue opening from another workflow
- [CT-3062] Testing issue opening from another workflow
- [CT-3077] Testing issue opening from another workflow
- [CT-3089] [CT-3065] [Feature] DBT docs allow to view grants for each model HOT 1
- [CT-3237] Display compiled metric code in docs HOT 2
- [CT-3238] Display the metric definition/configuration in dbt docs HOT 2
- [CT-3244] [CT-3241] [Feature] Docs increase size of Description field and reduce the column field size HOT 2
- [CT-3248] [CT-3247] [Feature] <dbt docs - support for nested path> HOT 2
- [CT-3265] The docs generated for exposures should display owner HOT 2
- [CT-3331] Exposures aren't displaying any properties (dbt > 1.6.0) HOT 2
- [CT-3371] Include unit tests in dbt docs HOT 3
- [CT-3427] [CT-3415] [Feature] LaTeX rendering in DBT docs HOT 1
- [CT-3498] [CT-3490] DBT Docs: Add Meta As A Filter Checkbox
- [CT-3523] [regression] missing details for exposures in dbt docs page
- [CT-3530] Presence of `saved_queries` breaks lineage viz
- [CT-3533] dbt docs generate doesn't hide package's documentation HOT 3
- Docs only showing tags in details section of source objects HOT 3
- Macro naming conventions HOT 4
- Testing issue opening from another workflow
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dbt-docs.