GithubHelp home page GithubHelp logo

mozilla / gud Goto Github PK

View Code? Open in Web Editor NEW
3.0 12.0 4.0 2.31 MB

Mozilla Growth & Usage Dashboard, pronounced "Good"

Home Page: https://gud.telemetry.mozilla.org

CSS 1.77% HTML 0.82% JavaScript 34.43% Dockerfile 0.23% Svelte 62.74%

gud's Introduction

Growth and Usage Dashboard

This is a light, server-powered dashboard showing the smoot growth metrics. The frontend talks to a tiny node server by passing it the segments / usage criteria / etc. necessary for the query, and the tiny web server sends the query to be run by BigQuery.

Community

Post in #gud on Slack for any other questions.

Reporting Issues and Feature Requests

Feel free to file an issue in this repository w/ questions / concerns.

Development

Dependencies:

– Node 11.5.0 / current NPM version

To install:

  • Make sure you nave Node / npm.
  • run npm install in the directory where you cloned this repository.

To run locally:

The GCP commands in these instructions will not work unless you work under Katie Parlante. If you want to run this project and you don't work under Katie Parlante, please contact Jason Thomas or Blake Imsland.

  • Run gcloud auth application-default login
  • Run gcloud config set project moz-fx-data-shared-prod
  • To run the server, run node server which starts a tiny web server on port 3000 (go to localhost:3000 in your browser).
  • To build / update the frontend, type npm run dev, which spins up another web server (that we're not going to use, sorry for the redundancy here) and builds the little dev version of the frontend. – I'll make it so you don't have to run two servers like this at some point, but this works for now!

gud's People

Contributors

hamilton avatar jklukas avatar mikaeld avatar openjck avatar robotblake avatar wlach avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

wlach openjck

gud's Issues

Update desktop usage criteria

We should rename the usage criteria "Any Firefox Activity" to "Any Desktop Firefox Activity" to make clearer that it is desktop-only. I believe it is already the latter in the backend tables.

Mentions
@hamilton

migrate to a global Svelte store + immer

once we have an updated roadmap in-place, it would be worthwhile to revisit / harmonize how we're handling the server + frontend so any engineer who works on this project can follow a unified set of design principles for similar dashboards. Step 1 of that is to migrate the store handling to use immer. There are some simple patterns that are being built-out that I'll link to here once they're written up.

Support multiple series in Explore Mode

I would like to be able to support multiple series on a single plot in Explore Mode. An example use case is, knowing that a metric is moving in some slice (say Mac OS), to see how the movement looks in key slices across another dimension (say country).

I'm open to suggestions for the UI, but perhaps having the dimension selectors (currently platform, country, and channel) in a tabbed pane and have a widget next to the tabs to add a new tab (start with just one tab). Each tab allows selection of the slice for a distinct series. And have each tab have a color corresponding to the plotted series?

An example plot with multiple series:

image

design and implement body controls

(1) move the date selector over to the top left of the body
(2) add a filters: COUNTRY US x GB x. type display in body
(3) leave room for other view selection controls

support somewhere in the UX "comparing slices"

We will also want to support the notion of e.g. us VS ca VS gb specifically, which is distinct from what is in mind for "Compare" mode, where you define slices for two distinct groups, then compare the output of those.

Change "Intensity" metric name to "Average Days Per Week"

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] Please don't include screenshots or specific numbers. This issue will be publicly viewable, but the GUD data itself is under NDA.

Concensus that "Intensity" name is confusing.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Change name to "Average Days Per Week"

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Mentions
Add mentions for anybody you'd like to make aware of this issue; likely @hamilton, @jmccrosky, @klukas, and @openjck.

Add some segments to GUD, once they are defined

This ticket is a "heads up" about a possible future request, and is not a direct request yet.

I'm working on developing some canonical user segments for desktop. The goal is to find segments that include/exclude sets of similar clients, so that we can reduce the impact of confounding variables when analysing data.

For example, we'll likely want to isolate "activated" users from non-activated users, for some definition of activated, and by studying the retention of "activated" users we can remove the effects due to bots and computer labs that create short term profiles. Or we might want to study the properties of heavy users.

We would like these segments to be available on GUD, as well as in mozanalysis, and have example queries on DTMO to help people use the common set of segments in manual queries too.

The most eccentric part of this is that I would like the freedom to iterate on the segments - so that we can start using segments soon, and tweak them as we learn what's useful. I imagine having version numbers in the names of segments until they're stable, and I imagine the old versions becoming replaced by the new versions (so presumably GUD would only need the most recent version or two, but the version number should be included in the segment name).

I am starting my search for segments by using features from clients_last_seen, and building heuristics that can be represented in a SQL SELECT expression. In the long term we might want to move past this and involve ML in deciding which clients fit which segments, but that feels a long way away and there's a lot of value we can unlock in the meantime.

Here are some example segments that give a flavour of what we might want to look at:

  1. Users who visited/didn't visit 5 uris on a day 7-13 days before submission_date
  2. Users who visited at least 213 URIs on submission_date
  3. Users who visited x URIs in period y before submission_date
  4. Users from Tier 1 countries

For each segment, we want to be able to plot MAU, DAU, retention rates, etc - the full range of metrics.

Describe the solution you'd like
When I provide some segment definitions (e.g. as a PR to bigquery-etl), I would like the GUD front end to allow people to filter graphs to include or exclude users that fit a certain segment. Some segments will come in pairs ("included by the criteria"/"excluded by the criteria"). Others may have multiple levels (e.g. "low usage"/"medium usage"/"high usage"). Comparing included/excluded users will be a common use case.

It seems like some segments will fit under "Product / usage criteria", some might fit under "Country", and others might require their own dropdown?

Describe alternatives you've considered
Still working out the main proposal, haven't got to the point of multiple alternatives yet!

Additional context
Proposal document where I guessed that implementing segments like "visited 5 uris on a day 7-13 days before submission_date might take 1-2 weeks from the day I submit a PR to bigquery-etl, and I pointed out that "time estimations are plucked from my gut and involved no consultations"

Mentions
@hamilton, @jmccrosky, @klukas, and @openjck.

Change name/branding

From "Smoot" to "Mozilla Growth & Usage Dashboard" or "GUD" or MGUD" or M-GUD" or something ;)

move to hash-based routing for views

This entails moving form ?mode=explore to /#explore, for instance. Each of these hash-based views are likely to have different query params / other possible routes that need to be split up.

implement a multiselector

This will be used for dimensions such as country (where a user might want to select multiple countries for aggregation / comparison).

integrate query hitting telemetry.smoot_usage_all_v1 into server.js

The query

SELECT
  `date`,
  usage,
  SUM(dau) AS dau,
  SUM(wau) AS wau,
  SUM(mau) AS mau,
  SAFE_DIVIDE(SUM(active_days_in_week),
    SUM(wau)) AS intensity,
  SAFE_DIVIDE(SUM(active_in_week_1),
    SUM(new_profiles)) AS retention_1_week_new_profile,
  SAFE_DIVIDE(SUM(active_in_weeks_0_and_1),
    SUM(active_in_week_0)) AS retention_1_week_active_in_week_0
FROM
  telemetry.smoot_usage_all_v1
WHERE true
  AND `date` = '2019-05-01'
GROUP BY
  usage,
  `date`
ORDER BY
  1, 2

was presented to me. Let's make sure everything is understood easily before using this.

Allow focus on a single metric

We would like to support showing just a single metric using the full content pane (to make the graph as large as possible). This could mean suppressing rendering of other graphs or could just be some sort of zoom function in which case the other metric graphs would be just a scroll away.

Dates are off by one

Describe the bug
The dates attached to values in GUD appear to be one day before the submission_date associated with the values.

To Reproduce
Steps to reproduce the behavior:

  1. Go to https://growth-stage.bespoke.nonprod.dataops.mozgcp.net/?endDate=2020-03-01&mode=explore&usage=Any%20Firefox%20Desktop%20Activity&attributed=%5B%5D&metric=all&os=%5B%5D&language=%5B%5D&country=%5B%5D&channel=%5B%5D&startDate=2020-02-02
  2. Mouse over the last data point on the WAU graph. Even though the end date is specified as 2020-03-02, the last point shown in the mouseover is marked as 2020-03-01
  3. Note the WAU value shown for 2020-03-01 and compare to the query below; it exactly matches the value in the query for submission_date = '2020-03-01'
SELECT
  submission_date,
  COUNTIF(days_since_seen < 7) AS wau
FROM
  `moz-fx-data-shared-prod.telemetry.clients_last_seen`
WHERE
  submission_date >= '2020-03-01'
GROUP BY
  1
ORDER BY
  1

Expected behavior
The dates shown on the graphs should match the submission dates associated with the values.

Mentions
@hamilton, @jmccrosky

redesign the GUD url scheme

GUD needs to have a bit easier to use URL scheme that is shorter. This could open up the door for much, much more expressive querying and utility on the client and server sides, including arbitrary comparisons between sets of query params. I'm sure this is reinventing someone's wheel, but if done right we might be able to continue to not have a more involved server component for GUD for another few years.

A few improvements:

(1) remove empty kv pairs such as country=[]. We can infer if it is empty that a default value will be set.
(2) consder a short value that is a single alphanumeric for each key that maps to a single character, for instance US=>u. This is more meaningful when we can reduce something like All Firefox Desktop Activity to x. These alphanumeric shorts are unique to the dimension or metric. a-zA-Z0-9 contains 62 values, more than enough to represent almost all these dimensions going forward. In the case of usage criteria, a dimension which could go beyond 62 values, we could easily use two alphanumerics, yielding 3,844 values, or go with three - 238,828 values, just to be safe. In any case the reduction will still be pretty considerable, and throwing out a delimiter like a comma here keeps the length short.
(3) dimension names can be short and dependent on the usage criterion specifically, reducing even further. If we follow (2) above, then we can make any dimension listing delimited by something like -, leaving something like the full country specification to be Cugb0, where C means country, and the rest of the alphanumerics represent individual countries.
(4) we can leave in startDate and potentially other view filters as-is, since they are not specific to the dat itself. For dates, we could easily have sd414, representing number of days since jan 01 2015 or something like that. We can also change somthing like mode=explore to also just be a hash-route.

examples of compression

1

?startDate=2017-06-17&endDate=2020-04-04&mode=explore&usage=Any Firefox Desktop Activity&attributed=[]&metric=all&os=[]&language=[]&country=[]&channel=[] (153 chars)

#explore/?sd905&ed1032&v=e&q=Ufda (33 chars, ~21% the size of original)

2

/?startDate=2017-06-17&endDate=2020-04-04&mode=explore&usage=Any Firefox Desktop Activity&attributed=["TRUE"]&metric=all&os=["Windows_NT"]&language=[]&country=["DE"%2C"GB"%2C"US"]&channel=[] (190 chars)

#/explore/?sd905&ed1032&q=Ufda-AL-Ow-CdgU (41 chars, ~21% or original)

clean up menu selectors

The menu selectors as they are right now were half-implemented to get a POC together. Using the select html element, however, is obviously limiting, so we will need to impement a radiobox-like thing (a multi-select dropdown).

implement usage criterion-specific channel specifications (eg for fenix / fennec)

This should be fairly easy to implement. in options.json, for the usage criteria listed below, we should put a new channels option (instead of the flag that disabled the channels) with an array of string values:

fennec android: release, beta, Other, nightly, and aurora
Fennec iOS: release, Other, and beta
Focus Android: release, nightly, beta, and Other

Then the channel selector should easily be able to read these values from options.json.

delay running query until user hits a "run query" button?

Now that we have a MultiSelector, it's a bit too easy to start, say, 5-6 queries by selecting a bunch of countries. I could see this both causing the server to bottleneck with a bunch of extraneous queries & also possibly cost a bunch of money. Here are some options:

(1) delay running the query for k seconds: this allows for other quick selections before query automatically runs
(2) require user to hit a "run query" button: this would allow a user to select a whole bunch of things, then hit "run query" or something like that. In theory we could also have a "cancel changes" button if they want to go back to the last run query. There is probably a history component that could be used to page between the different (cached) queries a user has run.
(3) forget about this entirely, since this never ends up being that expensive: I think Jeff could answer this question better than I could. If the tables we're querying against are relatively small and cheap, then let's not sweat it too much. The node server is mostly awaiting Promises to resolve, so there isn't much additional weight against the server, I think.

Implement date selectors

We would like to have a date range picker to allow the graphs to display a limited date range.

add shortcuts / hotkeys / interactions guide

There are actually some great features we can easily implement w/ graph-paper around shift+click, click & drag, etc that would enable all sorts of great ways of comparing things. We have some of this today in GUD but it is invisible to the end user.

We can show a list of commands or a button that pops up a command list.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.