GithubHelp home page GithubHelp logo

test-mass-forker-org-1 / alligator2 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from google/alligator2

0.0 0.0 0.0 127 KB

A sample integration between the Google My Business API and the Cloud Natural Language API

License: Apache License 2.0

Python 58.94% Jupyter Notebook 41.06%

alligator2's Introduction

Alligator 2.0

This is not an officially supported Google product. It is a reference implementation.

This tool is a Python-based solution that aggregates Insights data from the Google My Business API and stores it into Google Cloud Platform, precisely in BigQuery. Insights data provides details around how users interact with Google My Business listings via Google Maps, such as the number of queries for a location, locations of where people searched for directions, number of website clicks, calls, and reviews. The tool provides a cross-account look at the data instead of a per-location view.

Along with gathering stats, the Google Cloud Natural Language API is used to provide sentiment analysis and entity detection for supported languages. This is a fully automated process that pulls data from the Google My Business API and places it into a BigQuery instance, processing each review's content with the Natural Language API to generate a sentiment score for analysis. Furthermore, reviews can be classified by 'topics' to help surface areas of improvement across different locations.

Google My Business Account Prerequisites

  • All locations must roll up to a Location Group (formerly known as Business Account). Click here for more information. Multiple location groups are supported and can be queried accordingly (refer to the samples inside the sql directory.
  • All locations must be verified

Installation and Setup

Follow the steps below, or alternatively open the alligator2.ipynb notebook using Google Colaboratory (preferred for better formatting) or Jupyter for a more interactive installation experience. The notebook contains additional information on maintenance and reporting, and will help you better visualize the data that will get imported into BigQuery after running the solution.

Unlike traditional notebooks, the alligator2.ipynb notebook references the code in this GitHub repository rather than hosting its own version of the code.

Install Python Libraries

Install the required dependencies

$ pip install --upgrade --quiet --requirement requirements.txt

Google Cloud Platform Project Setup

Follow the steps to Enable the API within the Google My Business basic setup guide and create the necessary OAuth2 Credentials required for the next steps.

Go to Enable the Cloud Natural Language API.

Go to Enable the BigQuery API.

Please note that BigQuery provides a sandbox if you do not want to provide a credit card or enable billing for your project. The steps in this topic work for a project whether or not your project has billing enabled. If you optionally want to enable billing, see Learn how to enable billing.

Install OAuth2 Credentials

Create a file named client_secrets.json, with the credentials downloaded as JSON from your Google Cloud Platform Project API Console.

Download the Google My Business API Discovery Documents

As the GMB API is migrating to a new federated model, this tool needs to work with a different subset of discovery documents until the migration is completed.

Download the federated Google My Business API Discovery Documents

Go to the Samples page, download the discovery docs for the services used by this tool, and save the files as indicated in the following table.

Service File Name
Account Management API mybusinessaccountmanagement_discovery.json
Business Information API mybusinessbusinessinformation_discovery.json

To download each file, right click Discovery doc below the name of each service, and select Save Link As. Then, save the file with the name indicated in the table in the same directory.

Download the v4 Google My Business API Discovery Document

Go to the Samples page, right click Download discovery document, and select Save Link As. Then, save the file as gmb_discovery.json in the same directory.

Run the solution

Execute the script to start the process of retrieving the reviews for all available locations from all accessible accounts for the authorized user:

$ python main.py --project_id=<PROJECT_ID>

The script generates a number of tables in an alligator BigQuery dataset.

CLI Usage

Usage:

$ python main.py [-h] -p PROJECT_ID [-a ACCOUNT_ID] [-l LOCATION_ID]
                 [--no_insights] [--no_reviews] [--no_sentiment]
                 [--no_directions] [--no_hourly_calls] [--sentiment_only] [-v]

Optional arguments:

-h, --help            show this help message and exit
-p PROJECT_ID, --project_id PROJECT_ID
                      a Google Cloud Project ID
-a ACCOUNT_ID, --account_id ACCOUNT_ID
                      retrieve and store all Google My Business reviews for
                      a given Account ID
-l LOCATION_ID, --location_id LOCATION_ID
                      retrieve and store all Google My Business reviews for
                      a given Location ID (--account_id is also required)
--language LANG
                      the ISO-639-1 language code in which the Google My Business
                      reviews are written (used for sentiment processing). See
                      https://cloud.google.com/natural-language/docs/languages
                      for a list of supported languages
--no_insights         skip the insights processing and storage
--no_reviews          skip the reviews processing and storage
--no_directions       skip the directions processing and storage
--no_hourly_calls     skip the hourly calls processing and storage
--no_sentiment        skip the sentiment processing and storage
--no_topic_clustering skip the extraction of topics for each review
--sentiment_only      only process and store the sentiment of all available
                      reviews since the last run (if --no-sentiment is
                      provided, no action is performed)
-q, --quiet           only show warning and error messages (overrides --verbose)
-v, --verbose         increase output verbosity

Notes

For the initial data load into BigQuery, a maximum of 18 months of insights data will be retrieved, up to 5 days prior to the current date. This is due to the posted 3-5 day delay on the data becoming available in the Google My Business API. For phone calls and driving directions, only data from the last 7 days is retrieved. Finally, data is inserted into BigQuery with a batch size of 5000 to avoid running into API limits, especially when using the BigQuery Sandbox. These defaults are defined in api.py and can be tuned according to indiviual needs.

Furthermore, all available reviews in BigQuery will be used only for the first run of the sentiment analysis. Once the analysis is complete, an empty file named sentiments_lastrun will be created in the application's root directory, and this file's modification timestamp will be used for subsequent sentiment analysis runs so that only non-analyzed reviews are taken into consideration. Delete the file to rerun the analysis on all available reviews.

In terms of language processing, you can use the --language CLI flag to set the desired language that the Cloud Natural Language API should use for the sentiment analysis. This is particularly useful for reviews which may contain multiple languages. Refer to this post for a list of languages supported by the API. You might need to deactivate one or more of the text annotation features in api.py accordingly if your language is not yet supported.

Finally, using the topic extraction feature requires the sentiment analysis to be enabled (i.e., you can't run the topic extraction with the --no_sentiment flag). This particular use case will generate a file named cluster_labels.txt with a list of recommended topics based on word repetition in the reviews dataset. You can fine tune this list and add your own terms. If this file exists, it will be read by the tool and used as a list of topics to cluster reviews in, otherwise, the file will be recreated and the process will use the most frequent list of nouns.

Authors

alligator2's People

Contributors

mohabfekry avatar tonycoco avatar miguelfc avatar dependabot[bot] avatar hectorparragoogle avatar dulacp avatar donaldseaton avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.