GithubHelp home page GithubHelp logo

kpi-dashboard's People

Contributors

nmalkin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

mozilla

kpi-dashboard's Issues

How to meaningfully display rare buckets in segmentations?

For example: there are lots of different browsers, so trying to display all of them in a report would render it incomprehensible. To deal with this, only the biggest buckets are displayed (e.g., only the top 5 browsers). The rest are grouped into an Other bucket.

This is a problem because it is the more obscure browsers and operating systems that are likely to have issues. We should consider having a way to break apart the Other category and display its constituents (in a way that actually provides insight).

Report: percentage of new users successfully signed in

For Dan. Like #2, but the only number being visualized is the percentage of new users who were able to sign in successfully (no visualization of flow through intermediate steps).

Since it's just one number being visualized, it can fully use the framework for visualizing #1.

Report: can users log in/sign up?

from Bugzilla #746233

Can users log in/sign up?

  • a. How far do failed attempts get?
    Funnel analysis of process which show % of users that make it to a certain step for the happy path. This will identify drop off points and UX hotspots.
  • b. % of users who sucessfully validate the email
  • c. What percentage of interactions successfully complete
  • d. How long to successfull interactions take in total?

Persistence of state

Now that we're more certain that the current visualizations are staying, there can be more of an effort to make their state persist on page reload/reappearance.

For example, if the page is reloaded, we can preserve the last selected date range, remember the chosen segments, etc.

The big question is: is this necessary? The metric dashboard doesn't do that, for example.
Should get input from Crystal/Dan: to what extent do they want it to persist (always? never? only when loaded from permalinks?)

Report: median number of sites a user logs into with Persona

from Bugzilla #746231

KPI Dashboard's first Key Performance Indicator report will be:

Median # of sites a user logs into w Persona

Description: How well are we doing on Persona ID uptake? On average, how many sites does our user interact with over the day? This value should climb over time and stabilize at a healthy value.

A report should have the following common features:

  • Show current value and related information
  • Allow Segmenting across these axis:
    • buckets by number of emails
    • language code
    • buckets by screen size (mobile/tablet/desktop)
  • Express this number over time, so changes can be correlated with releases

Common features will show up in other KPI reports

Biased average values

Report #1 presents the average number of sites a user logs into with Persona (as of #29, it is the mean, not the median). The value is computed as the mean of all values of the number_sites_signed_in KPI across all data points on a given day.

Problem

Consider this sequence of operations by a single user:

  1. No-op (user opens dialog, maybe enters email, but doesn't succeed at logging in)
  2. No-op
  3. No-op
  4. Log in to site 1
  5. Log in to site 2
  6. Log in to site 3
  7. Log in to site 4

To compute the mean value, we will take the sum of all values for number_sites_signed_in (0+0+0+1+2+3=6) and divide by the total number of data points (6) to get a mean value of 1, while the correct value is, of course, 4.

In general, the problem is that multiple interactions by a single user are treated as equivalent to a single interaction by multiple users.

One way to account for this would be to try to aggregate the data points by user (i.e., figure out which data points came from the same person) and use only the maximum value. However, this is costly and has undesirable privacy implications.

Another way to handle it would be to weight higher values of number_sites_signed_in (e.g., 2=2×1, ...). This is equivalent to saying, "oh, I just saw a 2. That means I also saw a 1, but that 1 shouldn't count." This a more sensible approach, but note that it wouldn't fully correct the bias in the example above; nor (for the same reason) can it account for repeated sign-ins to the same site.

One more possibility is to do nothing, since we keep saying that this is not a very meaningful metric and we only care about its derivative. This is obviously the easiest, though we would probably have to stop calling it "average number of sites logged in."

Cache data, reports on the backend

Right now, each new request to the backend results in data retrieval and report re-calculation. There is a lot of potential for caching here.

Add date pickers to "sites logged in" report

Right now, the new user flow report has both a slider and a date picker available to select the date range, but the sites logged in report (and sign-in attempts) has only the slider. For ease of selection, and consistency, these reports should have date pickers as well.

Holes in data break user flow over time

When there are holes in the data (i.e., there is not a data point for every single day), the new user flow over time report (#18) breaks (problem parsing SVG with D3, no data written or rendered).

Axes and hover details disappear after empty series is displayed

To reproduce:

  1. Open New user flow report
  2. Select from and to dates to be equal and reload if necessary so that all data goes away
  3. Expand date range

Data will reappear, but not the axes and hover details.

The problem
Rickshaw falls apart when no data is displayed. 181851c introduced a fix for this that catches the exception when the graph is being updated and redraws it (from scratch). However, the axes and hover details are not being recreated.

The solution
will be to recreate the appropriate axes based on the report.
Or figure out why Rickshaw's breaking on empty series. But, you know, probably the former.

Avoid unnecessary requests: use already-downloaded data

If the user requests data for a particular date range, and we already have that data (e.g., user wants to see June, and we're displaying May through July right now), there's no reason to issue a new request to the server and re-download that data.

Permalinks to dashboard state

You can already link to individual reports, but it would be helpful if you could link to a particular version of the report (e.g., with a certain date range, or segmentation, selected).

Reorganize front-end code

It's a mess, right now, of shared functions, objects, function pointers. A more object-oriented approach (a report class, for instance), or maybe models, might make it cleaner.

Segment data by screen size

Report requirement for #1 requests "buckets by screen size (mobile/tablet/desktop)".

Clarifying questions:

  • Do we want the buckets to actually be mobile/tablet/desktop or do we want more fine-grained buckets (actual screen resolution)?
  • If it's the former, should determination be made based on screen resolution alone, or OS and browser as well?
  • If we want just screen resolution, do we display all resolutions, or just the top X?
    (This is what we do for other segmentations, but it seems less meaningful for screen resolution.)

Use mean instead of median for report #1

Report #1 is the median number of sites a user logs into with Persona.

As part of migrating to CouchDB as the backend (#27), finding the median of the data series becomes a significantly harder technical challenge. (To do it in a map/reduce framework requires a quick-select algorithm, which there doesn't seem to be a good way to do in CouchDB.)

Alternately, the median value for each day could be precalculated when data arrives and then stored in the database. However, this would require either a new database (cumbersome) or a change to the data format and code of the current one (very undesirable).

Calculating the mean of the dataset, however, is much easier.

While the median is a more sensible value to look at (it is less sensitive to outliers), it has been agreed, before, that this entire report is not hugely meaningful. The median value itself doesn't really say anything. The only way we'd use it is to watch the number and hope it trends up. In that case, however, the mean is just about as good: we can look at it and watch its trend.

Therefore, with @jedp, we have resolved to use the mean, instead of the median, for this report.

If anyone cares, we can discuss it here, and revisit this when there's more time.

Segment data by number of emails

Per specification of #1 and input from Crystal, we want buckets by number of emails (i.e., how many emails are associated with this person's Persona account?)

  • No emails (new user)
  • 1 email
  • 2 emails
  • 3 or more emails

Note that this is not meaningful for all reports. For example, all people in the report showing new user flow will presumably have no emails.

Information architecture for landing screen

Rather than displaying one of the reports, the landing screen should provide an at-a-glance overview of the entire dashboard. This means, at a minimum, links with explanations, to all the screens, or, better, actual KPI values (or visualizations?).

(Per Austin's suggestion,) interviews with Crystal and Dan about their workflows should give a better picture of what would be helpful.

Add running average to time series visualization

In addition to just showing the value on the given date, there should be a second line representing the average of that value over the preceding 7 days.

For clarity (so that it's actually useful), this should only appear when the data is shown cumulatively (no segmentation), as a line.

(for Dan)

How to avoid per-segmentation views in CouchDB?

For background, and what we're trying to accomplish, please see #27.

(Example) task

Counting the number of step completions in a given date range�

To get this data, we set up a view in CouchDB:

{
    map: function(doc) {
        // doc.newUserSteps is the array of steps completed by the user
        if(doc.newUserSteps.length > 0) { // Only count new users
            doc.newUserSteps.forEach(function(step) {
                // Save the completed step, with the date when it was completed as the key
                emit(doc.date, step);
            });
        }
    },

    reduce: function(keys, values, rereduce) {
        if(rereduce) {
            // omitted
        } else {
            // Count the number of times each step was completed
            var steps = {};
            values.forEach(function(step) {
                if(! (step in steps)) {
                    steps[step] = 0;
                }

                steps[step]++;
            });

            return steps;
        }
    }
}

Then, for example, if we wanted a report for June, we could query the view with startkey=2012-06-01&endkey=2012-16-30&group=false. Easy enough.

More complicated

Same task, but now we want to see the data segmented by operating system. So we set up a view.

{
    map: function(doc) {
        if(doc.newUserSteps.length > 0) {
            doc.newUserSteps.forEach(function(step) {
                // Instead of saving just the step, like last time,
                // now we save both the step and the OS.
                emit(doc.date, {
                    step: step,
                    os: doc.os
                });
            });
        }
    },

    reduce: function(keys, values, rereduce) {
        if(rereduce) {
            // omitted
        } else {
            var systems = {};
            values.forEach(function(value) {
                if(! (value.os in systems)) {
                    systems[value.os] = {};
                }

                if(! (value.step in systems[value.os])) {
                    systems[value.os][value.step] = 0;
                }

                systems[value.os][value.step]++;
            });

            return systems;
        }
    }
}

A little bit more complicated, but overall pretty similar. Quite manageable.

Problems

Okay, that worked, but now we want to segment by browser.
That's very similar. The only difference is that, instead of using the doc.os field, we'd need to use the doc.browser field.

Here's the problem: because views in CouchDB can't take arbitrary parameters, we'd have to create a completely separate view with nearly identical code.

And then again, when we want to segment by screen size. And for locale, and so on.

For each segmentation (there will be at least 5 of them), a new view will have to be created, and it will be nearly identical to the other ones.

Furthermore, since most reports will have segmentations, the views will be duplicated across reports.

Question

How can we avoid this?

Report: user flow over time

To fully address #2, in addition to the visualization of user drop-off introduced in b748117, there should be some way of visualizing the flow of users over time. (The existing report allows the selection of a date range, but doesn't make it easy to track the changes over time.)

A sketch of what this visualization should look like:
sketch of flow visualization
(Thanks to Annie Elliott for the idea!)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.