GithubHelp home page GithubHelp logo

sumo's Introduction

SUMO logo

SUMO Data Access

Scripts to pull data from various sources for SUMO Dashboards and upload it to GCP.

Datasource Overview

Google Analytics

Uses Google Analytics Reporting API v4 to pull dimensions and metrics for the Google Analytics SUMO report.

https://developers.google.com/analytics/devguides/reporting/core/v4/rest/

Make sure Analytics Reporting API is enabled in the GCP running the code. A valid service account should be permissioned to pull data from the SUMO report from the Google Analytics side. GoogleAnalytics/create_ga_tables.py creates Google Analytics BiqQuery tables with schema definition. GoogleAnalytics/get_ga_data.py pulls data for a given range. The data is written to local csv files in /tmp folder, and pushed to google storage gs:///googleanalytics/. The google storage files are uploaded to BigQuery dataset sumo table ga_*. After upload, the files are moved to the /processed subfolder. Some of the data pulls hit daily data limits so it is recommend to run data pulls in one month chunks. The Google Analytics API has a processing latency of 24-48 hours, https://support.google.com/analytics/answer/1070983?hl=en To prevent volatile numbers in the last 48 hours, the daily Google Analytics job retrieves data with a 48 hour lag.

Product Insights

Sentiment analysis on Twitter tweets and Kitsune questions. Pull data from GTrends for SUMO. The package uses an unofficial library, PyTrends, to pull data from GTrends.

Installing / Getting started

The scripts are intended to be run on a Google Cloud Project with necessary account permissions.

Assumes Google storage folder structure:

gs:// <sumo-bucket>  
    / googleanalytics => where google analytics data files are initially placed
    / googleanalytics / processed => where processed google analytics data files are placed after being uploaded to BigQuery
    / googleplaystore => where google  data files are initially placed [deprecated]
    / googleplaystore / processed => where processed google analytics data files are placed after being uploaded to BigQuery [deprecated]
    / tmp => model param files, aggregation files in subfolder by model pararm
gs:// <data-bucket> => location of parquet input data files
packagemanager install awesome-project
awesome-project start
awesome-project "Do something!"  # prints "Nah."

Here you should say what actually happens when you execute the code above.

Initial Configuration

Some projects require initial configuration (e.g. access tokens or keys, npm i). This is the section where you would document those requirements.

Developing

Here's a brief intro about what a developer must do in order to start developing the project further:

git clone https://github.com/your/awesome-project.git
cd awesome-project/
packagemanager install

And state what happens step-by-step.

Building

If your project needs some additional steps for the developer to build the project after some code changes, state them here:

./configure
make
make install

Units Tests

python setup.py test

Sigh, maybe someday.

Deploying / Publishing

Define GCP storage bucket where files should go.

packagemanager deploy awesome-project -s server.com -u username -p password

And again you'd need to tell what the previous code actually does.

Features

What's all the bells and whistles this project can perform?

  • What's the main functionality
  • You can also do another thing
  • If you get really randy, you can even do this

Configuration

Here you should write what are all of the configurations a user can enter when using the project.

Argument 1

Type: String
Default: 'default value'

State what an argument does and how you can use it. If needed, you can provide an example below.

Example:

awesome-project "Some other value"  # Prints "You're nailing this readme!"

Argument 2

Type: Number|Boolean
Default: 100

Copy-paste as many of these as you need.

Contributing

When you publish something open source, one of the greatest motivations is that anyone can just jump in and start contributing to your project.

These paragraphs are meant to welcome those kind souls to feel that they are needed. You should state something like:

"If you'd like to contribute, please fork the repository and use a feature branch. Pull requests are warmly welcome."

If there's anything else the developer needs to know (e.g. the code style guide), you should link it here. If there's a lot of things to take into consideration, it is common to separate this section to its own file called CONTRIBUTING.md (or similar). If so, you should say that it exists here.

Links

Even though this information can be found inside the project on machine-readable format like in a .json file, it's good to include a summary of most useful links to humans using your project. You can include links like:

Licensing

Licensed under ... For details, see the LICENSE file.

sumo's People

Contributors

ophie200 avatar edin-ogtal avatar mozafrank avatar cvalaas avatar gozer avatar havardl avatar rtanglao avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.