GithubHelp home page GithubHelp logo

aliasoblomov / universal-analytics-to-bigquery Goto Github PK

View Code? Open in Web Editor NEW
12.0 3.0 14.0 11 KB

This repository features a Python script designed extracting data from Universal Analytics, preparing it for compatibility, and subsequently loading it into Google BigQuery. This is particularly beneficial for businesses aiming to transfer their historical UA (GA3) data to BigQuery, especially those without access to Google Analytics 360.

Python 100.00%

universal-analytics-to-bigquery's Introduction

Universal-Analytics-to-BigQuery

This repository features a Python script designed extracting data from Universal Analytics, preparing it for compatibility, and subsequently loading it into Google BigQuery. This is particularly beneficial for businesses aiming to transfer their historical UA (GA3) data to BigQuery, especially those without access to Google Analytics 360.For a detailed walkthrough on how to use this script, including setup instructions, customization tips, and best practices, check out my Medium article:Backfill Universal Analytics to BigQuery: Zero Cost, Full Control . This article provides in-depth guidance and practical steps to make your data migration journey as smooth as possible.

Features

  • Initial Setup: Configures API scopes, authentication key file location, UA view ID, and BigQuery project details.
  • Google Analytics Reporting API Initialization: Sets up the connection using service account credentials.
  • Data Retrieval from UA: Fetches data based on specified metrics, dimensions, and date ranges.
  • Data Conversion to DataFrame: Converts API response into a Pandas DataFrame.
  • Uploading to BigQuery: Handles DataFrame column renaming, BigQuery client initialization, table creation, and data uploading.
  • **Availability for Both Billing and Sandbox Accounts: The script supports both billed and sandbox accounts in BigQuery, allowing for versatile testing and deployment environments.

Prerequisites

  1. Enable Google Analytics Reporting API: Visit https://console.cloud.google.com/apis/api/analyticsreporting.googleapis.com/ and enable the API.
  2. Service Account and JSON Key File:
    • Create or use an existing service account in BigQuery with owner access.
    • In the Google Cloud Console, navigate to “IAM & Admin” > “Service Accounts.”
    • Create a new service account or select an existing one with ‘Owner’ level access.
    • Generate and download a JSON key file for the service account.
    • Securely store the JSON key file and note its path for the KEY_FILE_LOCATION in the script.
  3. Add Service Account to UA Property Access Management: Include the service account email in the UA property access management.

Setup and Configuration

Fill in the following data in the script:

  1. KEY_FILE_LOCATION: Path to your downloaded JSON key file for the service account.
  2. VIEW_ID: Your Google Analytics View ID, found under Admin → View Settings in Google Analytics.
  3. BIGQUERY_PROJECT: Your Google Cloud project ID.
  4. BIGQUERY_DATASET: Dataset in BigQuery to store your data.
  5. BIGQUERY_TABLE: Table in BigQuery where you want to store your data.

Customizing Metrics and Dimensions

To tailor the script to your specific analytical needs, you can customize the metrics and dimensions in the get_report function.

  • Use the Universal Analytics to GA4 Reporting API - Dimensions and Metrics documentation to find available dimensions and metrics.
  • More dimensions can provide detailed reports but may reduce the granularity in terms of the number of rows. Fewer dimensions can lead to more granular reports with more rows.
  • Based on your requirements, modify the metrics and dimensions lists in the get_report function.

Running the Script

Execute the script directly as a standalone program. The main function orchestrates the data transfer process, with exception handling for potential errors.

Why Choose This Approach

  • Limited Historical Data in UA: UA's data retention limits are overcome by BigQuery's indefinite storage.
  • Custom Data Analysis Needs: BigQuery allows for more sophisticated, customized data analysis.
  • Data Ownership and Portability: Offers better control over data governance and portability.

Contributing

Contributions to this project are welcome! The goal of this project is to create an efficient, user-friendly tool for migrating data from Universal Analytics to BigQuery. Whether it's feature enhancements, bug fixes, or documentation improvements, your input is highly valued.

Contact Information

For help, feedback, or discussions about potential features, please feel free to connect with me on Linkedin.

universal-analytics-to-bigquery's People

Contributors

aliasoblomov avatar

Stargazers

Lury avatar  avatar Brad Czerniak avatar  avatar Jason Moon avatar Otabek Gulomov avatar Roy Huiskes avatar Raul Revuelta avatar Ebrahim Pichka avatar  avatar KIAN avatar Alireza Mahmoudian avatar

Watchers

 avatar  avatar  avatar

universal-analytics-to-bigquery's Issues

Problem with more then 7 dimensions?

When running this schema, the data uploads no problem.

metrics': [
{'expression': 'ga:sessions'},
{'expression': 'ga:pageviews'},
{'expression': 'ga:users'},
],
'dimensions': [
{'name': 'ga:dateHourMinute'},
{'name': 'ga:pageTitle'},
{'name': 'ga:pagePath'},
{'name': 'ga:source'},
{'name': 'ga:city'},
{'name': 'ga:region'},
{'name': 'ga:deviceCategory'},
]

But this schema causes an error (adding the two extra dimensions for Age and Gender). Which reads '__source does not have a schema._'

metrics': [
{'expression': 'ga:sessions'},
{'expression': 'ga:pageviews'},
{'expression': 'ga:users'},
# Add or remove metrics as per your requirements
],
#only allow nine
'dimensions': [
{'name': 'ga:dateHourMinute'},
{'name': 'ga:pageTitle'},
{'name': 'ga:pagePath'},
{'name': 'ga:source'},
{'name': 'ga:city'},
{'name': 'ga:region'},
{'name': 'ga:deviceCategory'},
{'name': 'ga:userAgeBracket'},
{'name': 'ga:userGender'},

],

Is there a limit to the amount of dimensions we can request? I thought it was 10.

2 pull requests

Hi @aliasoblomov thanks for the awesome script! Saved me a lot of work.

I updated the script to automatically pull all pages of data and to group by date. I thought people would find that helpful.

Also -- I updated the readme to reflect that the bigquery.jobs.create permission has to be granted to the service account separately from just the permission over the new data source.

I couldn't publish a branch (no permissions?) so I had to do it manually through the github.com web UI.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.