GithubHelp home page GithubHelp logo

gastatsdash's People

Contributors

eurogamerops avatar fayebutler avatar mrmonkington avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

mrmonkington

gastatsdash's Issues

Social Chart

Chart just social referral data (visitors/pageviews) over network and individual site

Easy way to preview reports during development

We need a python tool to preview the different report types that we have. It should work with this sort of signature:

python preview_report.py NetworkArticleBreakdown

Running this will generate a sample NetworkArticleBreakdown report for the last few days and write to NetworkArticleBreakdown_preview_report.html.

Reprioritise reports

Reports for longer periods of time should take priority over reports for shorter periods of time, e.g. yearly reports should run before monthly reports, which should run before daily reports

Caching data

Currently, we make a lot of calls to google analytics meaning we sometimes go over our query limit. A lot of the time these are repetitive calls for data that we've got before. For instance, for the Feb report, we need all the data for Feb. Then in the March report, we need all the data for March AND for Feb, meaning we call for all the data for Feb again.
If we could store this data somewhere, where statsdash can check if it has the data there first, this would save us a lot of google analytics queries.

Redis looks like the best option. Using the "Hash" data type we could store the data. It also has data persistence, by using .rdb and .aof file types.

Filtering for likely bot scraping sites

These sites all exist to scrape RSS feeds from gaming websites, and reproduce the RSS feed of EG's video output. They refer a surprising amount of traffic which I presume is just robotic. (I can't remember the time period for the PV counts below, I think it's YTD.)

Gamergossip.net: 80,689
gamerdevices.com - 37,262
gamersfun.net - 23,399
gamingstime.com - 28,218
gamersweb.net - 29,029
gamerfanatics.com - 20,346
gamersnews.net - 18,104

If we can get them shut down or struck out of the Analytics reports that would be lovely. Assuming I'm not missing something and they are legitimate sites.

Social PVs & Sessions Export

Jonty has asked for the report that exports a .csv of the Facebook and Twitter sessions and page views per month for Gamer Network to be automated to send to him every month
(social_export.py)

One-off: network-wide report of traffic from social networks

For the AGM next week and on occasion in future - probably quarterly at most - we'd like to show the tangible traffic that results from growing social media profiles. We've already got the data for the number of Facebook likes and Twitter followers by month, we need the traffic they've driven to map against it.

Need this per month from Jan-13 to Nov-15, and across the whole network.

  • Sessions from Facebook
  • Pageviews from Facebook
  • Sessions from Twitter
  • Pageviews from Twitter

Scheduler run length

Cron is set to run every hour, which can mean multiple running at once when one run through of the scheduler can take longer than an hour

Test suite

  • Scheduler unit tests for the more complicated methods
  • Scheduler integration tests with a test DB and mocked datetime.now() for testing running reports at different frequencies. These should cover edge cases like 27/28 day months, etc etc.

Monthly social report

This report would give a monthly view on each site's outreach on social and indicate its progress based on prior data and the network as a whole. Highlighting the most and least popular articles will show what works and what doesn't on different networks.

For the significant social networks (more on that below) I would like the following data for each active site in the network:

Visits, plus change MoM and YoY
PVs, plus change MoM and YoY
Visits as % of total site visits, followed by % of visits to the network from this social network during same time period
PVs as % of total site PVs, followed by % of PVs to the network from this social network during same time period
Top 10 articles on each social network and the PVs from each
Bottom 5 articles on each social network and the PVs from each
Chart of historical data showing monthly visits and PVs for this site, and comparing them to the network as a whole (not sure how to present this as the figures will be on very different scales, but I would like to be able to view a site's reach vs the overall network trend)

The list of networks should lead with Facebook, Twitter, Reddit and YouTube, but we want to keep an eye on other networks in case they start driving significant traffic. Duplicating the social-network readout of the existing monthly/daily reports would be fine, assuming that is calculated rather than hard-coded.

Each report goes to the site head and Jonty.

Updates

Just a few things I noticed when writing the docs

  • There should be a base google analytics report, and then the social reports etc should inherit from that, as there is a couple of repeated functions
  • Check the execute query function in youtube analytics, does it return the right error? should there be exponential back off like google analytics?
  • Change youtube config so that it's the same as google analytics and channels can have multiple IDs. Although is this feature even needed for youtube?

HTML File Size

When file size is over 102kb, creates "message clipped" in gmail.

Users shouldn't have to edit `Youtube/analytics.py`

In the docs this bit mentions adding the following to the Youtube/analytics.py file:

import sys
        sys.path.append("/path/to/gastatsdash")
if __name__=="__main__":
        analytics = Analytics()
        print analytics.get_content_owner()["items"][0]["id"]

But adding config stuff to a file that is in Git is bad, so we shouldn't have to do this, we should be able to load the statsdash path from a config, or figure it out based on the current working directory

Updates to daily and monthly reports

Promised followup from Jonty and Simon.

Site daily report

  1. Compare MTD figure to full-month figure last year and post the difference as “To exceed this month last year, we need XXXX PVs in the next Y days” where Y is the number of days remaining in the month
  2. Add country split so people understand where their audience is. Suspect full list would take up too much space, so maybe only show top five countries (unless there is a significant audience across a wider number of territories)
  3. Split “top articles” list so it’s the top 5(?) of each article category (if this suits the site.) So news, reviews, features, tips etc.
  4. Ideally the report will display in a single email without displaying "message clipped"
  5. Include lowest-performing articles as discussed

Both reports

  1. Standardise with PVs in the left-hand column and visits on the right
  2. Combine the referrals from m.neogaf.com and neogaf.com into a single figure
  3. Exclude Feedly from the referral list

Not really a reporting issue, although it sort of is
When the message is truncated - as the network report now is - can we track and see how many people click through to the full message?

Leap year issue with dates

Leap years don't seem to be accounted for when calculating the next run. This meant the monthly feb report ran on 29th feb (instead of 1st march) and the monthly march report is now set to run on the 28th march instead of the 1st April.

Traffic Sources Report

Traffic by source and movement period - period
Traffic by device i.e. Mobile, Desktop and Tablet and movement period - period
Social media traffic and movement period - period

Checking for available data

Need to update how we check if the data for a site is available.
Currently checks page views for each hour of the day, but if a low traffic site has zero page views in a particular hour, it doesn't return that hour and therefore checks as data not available.

Analytics v4

Updating to version 4 could save us a lot of requests to analytics as you can now request multiple date ranges in one query.
https://developers.google.com/analytics/devguides/reporting/core/v4/
Migration can be done using this : https://github.com/googleanalytics/gav4-python
But a bit of a rethink/refactor may be needed, especially as response objects are laid out differently

! the include-empty-rows parameter default has changed in v4 to false. this will need to be set to true so our data available checks work

Monthly social referral report for the network

I need a monthly rollup report showing the total sessions and pageviews from the top social networks.

  • Facebook
  • Twitter
  • Reddit
  • Top three other referring social networks

For each network show the sessions and pageviews from each across the whole network and the MoM and YoY change.

Each report shows these figures for the last 12 months, if that's possible - is that OK when we can't guarantee the "top three referrers" won't be the same each month?

Report goes to me.

Documentation

We need to document statsdash using sphinx http://www.sphinx-doc.org/en/stable/index.html and get it on to our readthedocs account.

Docs should be RST format - check out the 'getting started' tutorial here: http://www.sphinx-doc.org/en/stable/tutorial.html. They can be multiple pages/sections as appropriate - whatever reads best!

The docs should roughly look like this: https://gamer-network-mormont.readthedocs-hosted.com/en/latest/service.html

Things to document:

  • What's the goal of this project? What features does it provide and who is it aimed at?
  • How does someone quickly get started (feel free to duplicate this from the readme)
  • Detail where configs live, what's are the important options, etc.
  • Go through the interface from a developer's perspective of adding a new report to the schedule - what options are available for specifying the sites a report runs on, the time it runs, override options, other schedule metadata.
  • What types of report are available? What info do they offer? Classnames/paths etc. Image snippets could be handy.
  • Deeper dive in to how the code is structured. How do the following components work/fit together, what are the key classes to look at?
    • scheduler
    • report classes
    • data source classes
    • templates
  • How would you go about extending statsdash? How to add a new report? How to add a new data source (imagine a twitter API)?

Graph

Monthly device graph didn't include July in July's report.

Should error more gracefully/clearly when getting a 403

I'm getting a 403 but it's not immediately obvious as the error I get is this:

Traceback (most recent call last):
  File "scheduler.py", line 218, in run_schedule
    _run(dryrun)
  File "scheduler.py", line 187, in _run
    data_available = report.check_data_availability()
  File "/home/thomas/gastatsdash/Statsdash/report.py", line 143, in check_data_availability
    check = self.data.check_available_data()
  File "/home/thomas/gastatsdash/Statsdash/Youtube/aggregate_data.py", line 42, in check_available_data
    data_available = analytics.data_available(id, self.period.get_end())
  File "/home/thomas/gastatsdash/Statsdash/Youtube/analytics.py", line 106, in data_available
    results['rows']
TypeError: 'NoneType' object has no attribute '__getitem__'

but in the error logs it's a bit clearer:

WARNING 03-10-2016 12-38-33 analytics->execute_query HTTP error 403 occurred:
{
 "error": {
  "errors": [
   {
    "domain": "global",
    "reason": "forbidden",
    "message": "Forbidden"
   }
  ],
  "code": 403,
  "message": "Forbidden"
 }
}

It'd be nice to be able to see this when running the command manually

Video Report

  • On the second chart, can you calculate the ratios as whole numbers rather than to decimal places
  • For each column calculate the average across the network and display at the top of the chart (with +/- change vs last period). This is to enable comparison between
  • Can we send this weekly as well as monthly please? Ideally run it so that it arrives on Monday morning by 8am, if that's possible - if it runs at (say) 3am then that should catch the bulk of the weekend traffic.

Add site breakdown to monthly dash

From Simax:

Do you think we could evolve the bigger monthly stat dash and show the following by site:

UU, PVs, Sessions and the movement MoM
Traffic by source and movement MoM
Traffic by device i.e. Mobile, Desktop and Tablet and movement MoM

Graphs!

Cos everybody likes a picture.

Rate limit issue

Currently it's easy for the rate limits to be hit when collecting the stats, and with our current set up we can only afford to run one two reports per day (despite the rate limit being 50000 requests!), which makes testing very difficult.

There are a few things we should look into to help prevent this problem:

  • Decrease the number of requests per report
  • Perform a rate limit check before running a report
  • Create a mode that makes no requests (the test mode currently still makes those requests) [or does it? see this comment]
  • Perhaps span different sites across different Google accounts, although that would make configuration and account administration kind of a pain so I think this should be treated as a last resort

Hard coded reference to Jelly Deals

There's a reference to jelly.deals here that we will need to move out to a config or something.

As far as I'm aware, this is just test code and the whole line can be removed

Twitter targeting tool

As discussed (admittedly a little while ago) with Mark, it would be useful to identify the people on Twitter who drive the most referral traffic, and the patterns in which they do so. This information would enable us to target - ideally automatically - the new articles that we post so that they achieve maximum reach.

The data of interest would be:

  • Which followers RT/share most regularly
  • Which followers engage (reply to or share) most regularly
  • Which of the above drive significant interaction and/or further sharing - so, for instance, a Notch RT would be much more valuable than a random spambot or a person with only 25 followers. Ratio of following/followers and a flag for follower count greater than X thousand would be a starting point.
  • What time they tend to be on Twitter/interacting with things on Twitter
  • What they tend to RT/talk about, as ascertained by tracking hashtags, keywords and other accounts they interact with.

This data would enable us to identify who to target with specific content and when.

Ideally we'd be able to pull this data from people who follow our own brands, but also other specified accounts - so if somebody terribly useful tends to RT IGN's stories about Dark Souls lore, for instance, we could @ them when we post our Dark Souls lore feature because we know that they're interested (and because many of our brands are verified, they would be notified of our mention.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.