GithubHelp home page GithubHelp logo

jfeng3 / backfill-ga4-to-bigquery Goto Github PK

View Code? Open in Web Editor NEW

This project forked from aliasoblomov/backfill-ga4-to-bigquery

0.0 0.0 0.0 92 KB

Backfill-GA4-to-BigQuery" repository offers a solution for users to backfill their GA4 data into BigQuery. This is useful for those who need historical data from the start of their GA4 property, as GA4 data is typically only available in BigQuery after linking the two services. Our solution provides a complete backfill of data to BigQuery

Python 100.00%

backfill-ga4-to-bigquery's Introduction

Backfill-GA4-to-BigQuery

Backfill-GA4-to-BigQuery" repository offers a solution for users to backfill their GA4 data into BigQuery. This is useful for those who need historical data from the start of their GA4 property, as GA4 data is typically only available in BigQuery after linking the two services. Our solution provides a complete backfill of data to BigQuery. It uses OAuth 2.0 credentials for desktop applications, making authentication easier and well-suited for IDEs like Google Colab.

Table of Contents

  1. Features
  2. Prerequisites
  3. Setup and Installation
  4. Troubleshooting
  5. Customization
  6. Contributing
  7. Contact

Features

  • OAuth 2.0 Authentication: Simplifies authentication with OAuth 2.0 credentials, ideal for desktop apps and environments like Google Colab.

  • Service Account Integration: Securely connects to Google Cloud services using service accounts for enhanced security.

  • Data Extraction from GA4: Fetches comprehensive GA4 data from a specified start date, ideal for historical data backfilling.

  • Customizable Configuration: Offers a config.json file for user-specific settings like table prefixes and property IDs.

  • BigQuery Integration: Efficiently processes and stores data in BigQuery with proper schema management.

  • Export Functionality: Enables exporting GA4 data to CSV format for external use.

  • Duplicate Check: Incorporates mechanisms to avoid duplicate data entries in BigQuery.

  • Flexible Data Retrieval: Allows data fetching from a specific date or the previous day.

  • Robust Error Handling: Includes effective error handling and logging for smooth operation.

Prerequisites

  • Google Cloud account with billing enabled.
  • Access to Google Analytics 4 and Google BigQuery.
  • Python environment (Python 3.x recommended).

Setup and Installation

Step 1: Create a New Project and Activate Analytics API

Step 2: Creating a Service Account

  1. Access Google Cloud Console: Visit the Google Cloud Console.

  2. Create a Service Account:

    • Navigate to "IAM & Admin > Service Accounts".
    • Click "Create Service Account", enter a name, description, and click "Create".
    • Grant necessary roles to the service account (e.g., Owner or BigQuery Admin + BigQuery Job User).
  3. Generate a Service Account Key:

    • Click on the created service account to manage it.
    • Go to the "Keys" tab and click "Add Key", then "Create new key".
    • Choose "JSON" as the key type and click "Create".
    • A JSON key file will be downloaded. Store it securely.

Step 3: Setting Up OAuth for Desktop App

To set up OAuth for a desktop application, you need to create an OAuth client ID in Google Cloud Console. Before creating an OAuth client ID, make sure to configure your consent screen if you don't have one already.

Configure the Consent Screen:

  1. Access Consent Screen Configuration:

    • In the Google Cloud Console, navigate to "APIs & Services > OAuth consent screen".
    • Select the external user type.
  2. Fill in Consent Screen Details:

    • Provide the necessary information, such as the app name, user support email, and developer contact information.
    • add your email (and others, if needed) in the "Test users" section.
  3. Publish the App:

    • Once all necessary information is provided, save and publish your consent screen.

Create OAuth 2.0 Client ID:

  1. Navigate to Credentials:

    • Go to "APIs & Services > Credentials".
  2. Create OAuth Client ID:

    • Click "Create Credentials" and select "OAuth client ID".
    • Choose "Desktop app" as the Application type.
    • Provide a name for the client ID and click "Create".
  3. Download Client Configuration:

    • After the OAuth client ID is created, download the client configuration JSON file.
    • This file contains your client ID and secret, which are essential for the OAuth flow.

Note:

  • The script uses a token.pickle file to store access tokens and refresh tokens. Once authenticated, you won't need to repeat the authentication process unless the token is revoked or expired.
  • Ensure that the JSON file is stored securely and referenced correctly in your project.

Step 4: Configuration File

Fill out and save a config.json file with your specific parameters. Example:

{
  "CLIENT_SECRET_FILE": "<Path to your client secret file>",
  "SERVICE_ACCOUNT_FILE": "<Path to your service account JSON file>",
  "SCOPES": ["https://www.googleapis.com/auth/analytics.readonly"],
  "TABLE_PREFIX": "<Prefix for the BigQuery tables>",
  "PROPERTY_ID": "<Google Analytics Property ID>",
  "DATASET_ID": "<BigQuery Dataset ID>",
  "INITIAL_FETCH_FROM_DATE": "YYYY-MM-DD"
}
  • Client Secret and Service Account File: Replace the placeholders with the actual paths to your OAuth client secret and service account JSON files.

  • Table Prefix: Specify the prefix for your BigQuery tables. If the specified prefix does not exist, the script will create tables with this prefix in BigQuery.

  • Property ID and Dataset ID: Insert your Google Analytics Property ID and the BigQuery Dataset ID where data will be stored.

  • Initial Fetch Date: Set the initial date from which to fetch historical data in YYYY-MM-DD format.

Step 5: Installation of Dependencies

Install necessary Python packages:

!pip install google-analytics-data==0.18.4
!pip install google-cloud-bigquery
!pip install google-auth==2.27.0
!pip install google-auth-oauthlib
!pip install google-auth-httplib2

Step 6: Running the Script

After configuring the config.json file and saving the source code with the same name backfill-ga4.py , it's time to run the script with the desired flags.

  • Execute the Script with Flags:
    • Use the %run command followed by the script name and the desired flag.
    • For fetching data from yesterday, use:
      %run backfill-ga4.py --yesterday
    • For fetching data from the initial fetch date specified in your config.json, use:
      %run backfill-ga4.py --initial_fetch
    • This will start the authentication flow

Step 7: Authentication

  • Run the Script:
    • Execute your Python script.
    • It will prompt you to open a URL for authentication.
    • Ensure that you choose a Google account that has the necessary access to selected property.
    • If you don't verify your app in the first step, select "Go to 'YourPublishedAPP'(unsafe)" to access the authentication code on localhost.
    • Your code can be found as a part of the URL between "code=" and the next ampersand. (Screenshot attached) Ga4-bigquery-Script Authentication.png
    • Copy and paste this code back into the script.
    • Now data retrieval process based on the specified date range should be completed. It is important that a table, that has been exported, is visible in both Bigquery table and CSV downloadable file.

Step 8: QA

  • Check for Successful Setup:
    • Upon successful completion of the script, it should indicate that the authentication process is complete and data fetching has started.
    • Now, you should be able to see the new tables in your Google Analytics BigQuery dataset (DATASET_ID specified in your config.json).
    • Additionally, the output.csv file in your project directory should contain the fetched data.
    • If the tables are visible and the CSV file has data, everything is set up correctly.

Troubleshooting

AttributeError on Script Execution

Issue: Encountering an AttributeError related to credentials.universe_domain when running the script.

Solution: This is likely due to version mismatches in google-auth and google-analytics-data libraries. Resolve it by upgrading both libraries:

pip install --upgrade google-analytics-data google-auth

Run this command in your terminal or command prompt to ensure you're using compatible versions, which should fix the issue.

Customization

Your project can be customized to fetch different metrics and dimensions based on your specific needs. Use the Google Analytics Data API schema to understand the available metrics and dimensions. You can then modify the script to query different sets of data from your Google Analytics account.

  • Tailor Metrics and Dimensions: In the script, identify the sections where API requests are constructed and modify the metrics and dimensions according to your requirements.
  • Consult API Schema: The API schema documentation provides a comprehensive list of all available metrics and dimensions, along with their descriptions and usage.

Contributing

Contributions to this project are welcome! Here's how you can help:

  • Reporting Issues: Report issues or bugs by opening a new issue in the GitHub repository.
  • Feature Requests: If you have ideas for new features or improvements, feel free to create an issue describing your suggestion.
  • Submitting Pull Requests: You can contribute directly to the codebase. Please ensure your code adheres to the project's coding standards and include tests for new features.

Contact

For help, feedback, or discussions about potential features, please feel free to connect with me on Linkedin.

backfill-ga4-to-bigquery's People

Contributors

aliasoblomov avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.