GithubHelp home page GithubHelp logo

gopinathsubbegowda / discovery-hiring-analyst-2016 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from wikimedia-research/discovery-hiring-analyst-2016

0.0 2.0 0.0 18.23 MB

Task description and data for candidates applying to be a Data Analyst in the Discovery department at Wikimedia Foundation.

discovery-hiring-analyst-2016's Introduction

Data Analysis Task

Task description and data for candidates applying to be a Data Analyst in the Discovery department at Wikimedia Foundation.

Background

Discovery (and other teams within the Foundation) rely on event logging (EL) to track a variety of performance and usage metrics to help us make decisions. Specifically, Discovery is interested in:

  • clickthrough rate: the proportion of search sessions where the user clicked on one of the results displayed
  • zero results rate: the proportion of searches that yielded 0 results

and other metrics outside the scope of this task. EL uses JavaScript to asynchronously send messages (events) to our servers when the user has performed specific actions. In this task, you will analyze a subset of our event logs.

Task

You must create a reproducible report* answering the following questions:

  1. What is our daily overall clickthrough rate? How does it vary between the groups?
  2. Which results do people tend to try first? How does it change day-to-day?
  3. What is our daily overall zero results rate? How does it vary between the groups?
  4. Let session length be approximately the time between the first event and the last event in a session. Choose a variable from the dataset and describe its relationship to session length. Visualize the relationship.
  5. Summarize your findings in an executive summary.

* Given dependencies and other instructions, we should be able to re-run your source code with the dataset in the same directory and obtain the same results and figures. Popular formats for this include RMarkdown and Jupyter Notebook (formerly IPython).

Note: if you submit your report as a Jupyter/IPython notebook on Greenhouse, please upload a copy to GitHub and include the link when you submit it on Greenhouse.

Data

The dataset comes from a tracking schema that we use for assessing user satisfaction. Desktop users are randomly sampled to be anonymously tracked by this schema which uses a "I'm alive" pinging system that we can use to estimate how long our users stay on the pages they visit. The dataset contains just a little more than a week of EL data.

Column Value Description
uuid string Universally unique identifier (UUID) for backend event handling.
timestamp integer The date and time (UTC) of the event, formatted as YYYYMMDDhhmmss.
session_id string A unique ID identifying individual sessions.
group string A label ("a" or "b").
action string Identifies in which the event was created. See below.
checkin integer How many seconds the page has been open for.
page_id string A unique identifier for correlating page visits and check-ins.
n_results integer Number of hits returned to the user. Only shown for searchResultPage events.
result_position integer The position of the visited page's link on the search engine results page (SERP).

The following are possible values for an event's action field:

  • searchResultPage: when a new search is performed and the user is shown a SERP.
  • visitPage: when the user clicks a link in the results.
  • checkin: when the user has remained on the page for a pre-specified amount of time.

Example Session

uuid timestamp session_id group action checkin page_id n_results result_position
4f699f344515554a9371fe4ecb5b9ebc 20160305195246 001e61b5477f5efc b searchResultPage NA 1b341d0ab80eb77e 7 NA
759d1dc9966353c2a36846a61125f286 20160305195302 001e61b5477f5efc b visitPage NA 5a6a1f75124cbf03 NA 1
77efd5a00a5053c4a713fbe5a48dbac4 20160305195312 001e61b5477f5efc b checkin 10 5a6a1f75124cbf03 NA 1
42420284ad895ec4bcb1f000b949dd5e 20160305195322 001e61b5477f5efc b checkin 20 5a6a1f75124cbf03 NA 1
8ffd82c27a355a56882b5860993bd308 20160305195332 001e61b5477f5efc b checkin 30 5a6a1f75124cbf03 NA 1
2988d11968b25b29add3a851bec2fe02 20160305195342 001e61b5477f5efc b checkin 40 5a6a1f75124cbf03 NA 1

This user's search query returned 7 results, they clicked on the first result, and stayed on the page between 40 and 50 seconds. (The next check-in would have happened at 50s.)

discovery-hiring-analyst-2016's People

Contributors

bearloga avatar ironholds avatar

Watchers

James Cloos avatar Gopinath Subbegowda avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.