Common solutions and tools developed by Google Cloud's Professional Services team. This repository and its contents are not an officially supported Google product.

License: Apache License 2.0

Python 47.74% Shell 2.08% CSS 0.32% HTML 22.80% JavaScript 1.76% Dockerfile 0.38% Go 7.20% Java 3.33% TypeScript 2.95% Makefile 0.26% HCL 4.31% Scala 1.89% Jinja 1.10% Smarty 0.36% C# 2.17% PLSQL 1.19% PowerShell 0.17%

google-cloud-platform google-cloud-dataflow google-cloud-ml google-cloud-compute gke bigquery solutions tools examples

professional-services's Introduction

Professional Services

Common solutions and tools developed by Google Cloud's Professional Services team.

Disclaimer

This repository and its contents are not an officially supported Google product.

License

All solutions within this repository are provided under the Apache 2.0 license. Please see the LICENSE file for more detailed terms and conditions.

Examples

The examples folder contains example solutions across a variety of Google Cloud Platform products. Use these solutions as a reference for your own or extend them to fit your particular use case.

Anthos Service Mesh Multi-Cluster - Solution to federate two private GKE clusters using Anthos Service Mesh.
Anthos CICD with Gitlab - A step-by-step guide to create an example CI/CD solution using Anthos and Gitlab.
Audio Content Profiling - A tool that builds a pipeline to scale the process of moderating audio files for inappropriate content using machine learning APIs.
Bigdata generator - Solution that generates large amounts of data for stress-testing bigdata solutions (e.g BigQuery). For each of the fields you want to generate, you can specify rules for generating their values. The generated data can stored in BigQuery or GCS (Avro, CSV).
BigQuery Analyze Realtime Reddit Data - Solution to deploy a (reddit) social media data collection architecture on Google Cloud Platform. Analyzes reddit comments in realtime and provides free natural-language processing / sentiment.
BigQuery Audit Log Dashboard - Solution to help audit BigQuery usage using Data Studio for visualization and a sample SQL script to query the back-end data source consisting of audit logs.
BigQuery Audit Log Anomaly Detection - Sample of using BigQuery audit logs for automated anomaly detection and outlier analysis. Generates user friendly graphs for quick bq environment analysis.
BigQuery Automated Email Exports - Serverless solution to automate the sending of BigQuery export results via email on a scheduled interval. The email will contain a link to a signed or unsigned URL, allowing the recipient to view query results as a JSON, CSV, or Avro file.
BigQuery Automated Schema Management - Command-line utility for automated provisioning and management of BigQuery datasets and tables.
BigQuery Billing Dashboard - Solution to help displaying billing info using Data Studio for visualization and a sample SQL script to query the back-end billing export table in BigQuery.
BigQuery Cross Project Slot Monitoring - Solution to help monitoring slot utilization across multiple projects, while breaking down allocation per project.
BigQuery Data Consolidator - Solution to consolidate data within an organization from multiple projects into one target Dataset/Table where all Source tables are of same schema (like Billing Exports!); specifically useful for data consolidation and further reporting in Cloud FinOps engagements.
BigQuery DDL Validator - A utility that will read the Legacy DDL and compare it against the previously extracted DDL and produce an output with the name of the objects where the DDL is no longer matching.
BigQuery Group Sync For Row Level Access - Sample code to synchronize group membership from G Suite/Cloud Identity into BigQuery and join that with your data to control access at row level.
BigQuery Long Running Optimization Utility - A utility that reads the entire SQL and provides a list of suggestions that would help to optimize the query and avoid the long running issues.
BigQuery Oracle DDL Migration Utility - Oracle DDL Migration Utility to migrate the tables schema (DDL) from Oracle DB to BigQuery. The utility leverages BigQuery Translation API and offers additional features such as adding partitioning, clustering, metadata columns and prefixes to table names.
BigQuery Pipeline Utility - Python utility class for defining data pipelines in BigQuery.
BigQuery Remote Function - It allows user to implement custom services or libraries in languages other than SQL or Javascript which are not part of UDFs. The utility contains sample string format Java code to deploy cloud run gen2 instance and invoke the service from BigQuery using remote function.
BigQuery Amazon S3 Migration Tool - Bigquery Migration Tool to transfer data from files in Amazon S3 to BigQuery Tables based on configuration provided.
BigQuery Snowflake TabRle Migration Tool - BigQuery Snowflake Table Migration Tool helps to migrate the table DDL's from Snowflake to BigQuery. The utility leverages BigQuery Translation API and offers additional features such as adding partitioning, clustering, metadata columns and prefixes to table names.
BigQuery Table Access Pattern Analysis - Sample code to analyse data pipeline optimisation points, by pinpointing suboptimal pipeline scheduling between tables in a data warehouse ELT job.
BigQuery Tink Toolkit - Python utility class for working with Tink-based cryptography in on-prem or GCP systems in a way that is interoperable with BigQuery's field-level encryption. Includes a sample PySpark job and a script for generating and uploading KMS-encrypted Tink keysets to BigQuery.
BigQuery to XML Export - Python tool that takes a BigQuery query and returns the output as an XML string.
BigQuery Translation Validator - A python utility to compare 2 SQL Files and point basic differences like column names, table names, joins, function names, is-Null and query syntax.
BigQuery Generic DDL Migration Utility - Generic DDL Migration Utility to migrate the tables schema (DDL) from Database(Oracle, Snowflake, MSSQL, Vertica, Neteeza) DB to BigQuery. The utility leverages BigQuery Translation API and offers additional features such as adding partitioning, clustering, metadata columns and prefixes to table names.
Bigtable Dataflow Cryptocurrencies Exchange RealTime Example - Apache Beam example that reads from the Crypto Exchanges WebSocket API as Google Cloud Dataflow pipeline and saves the feed in Google Cloud Bigtable. Real time visualization and query examples from GCP Bigtable running on Flask server are included.
Bigtable Dataflow Update Table Key Pipeline - Dataflow pipeline with an example of how to update the key of an existing table. It works with any table, regardless the schema. It shows how to update your key for a table with existing data, to try out different alternatives to improve performance.
Carbon Footprint Reporting - Example of using the prebuilt Data studio & Looker template for analysing GCP Carbon Footprint Estimates.
Cloud Audit Log Samples - A sample collection of Audit Logs for Users and Customers to better the structure, contents, and values contained in various log events.
Cloud Build Application CICD Examples - Cloud Build CI/CD Examples for Applications like containerization & deployment to Cloud Run.
Cloud Build with Proxy Running in Background - Examples of cloudbuild with docker-compose running tcp proxy in the background for all build steps.
Cloud Composer CI/CD - Examples of using Cloud Build to deploy airflow DAGs to Cloud Composer.
Cloud Composer Deployment in Shared VPC - Terraform code to deploy cloud composer in shared VPC environment.
Cloud Composer Examples - Examples of using Cloud Composer, GCP's managed Apache Airflow service.
Cloud Data Fusion Functions and Plugins - Examples of Cloud Data Fusion Functions and Plugins.
Cloud DNS load balancing - Multi-region HA setup for GCE VMs and Cloud Run based applications utilizing Cloud DNS load balancing and multiple Google Cloud load balancer types.
Cloud DNS public zone monitoring - Visualizing Cloud DNS public zone query data using log-based metrics and Cloud Monitoring.
Cloud Function Act As - Example of executing a Cloud Function on behalf and with IAM permissions of the GitHub Workload Identity caller.
Cloud Function VM Delete Event Handler Example - Solution to automatically delete A records in Cloud DNS when a VM is deleted. This solution implements a Google Cloud Function Background Function triggered on compute.instances.delete events published through Stackdriver Logs Export.
Certificate Authority Service Hierarchy - Root and Subordinate Certificate Authority Service CA Pools and CAs with examples for domain ownership validation and sample load test script.
Cloud Run to BQ - Solution to accept events/data on HTTP REST Endpoint and insert into BQ.
Cloud SQL Custom Metric - An example of creating a Stackdriver custom metric monitoring Cloud SQL Private Services IP consumption.
Cloud Support API - Sample code using Cloud Support API
CloudML Bank Marketing - Notebook for creating a classification model for marketing using CloudML.
CloudML Bee Health Detection - Detect if a bee is unhealthy based on an image of it and its subspecies.
CloudML Churn Prediction - Predict users' propensity to churn using Survival Analysis.
CloudML Customer Support and Complaint Handling - BigQuery + AutoML pipeline classifying customer complaints based on expected resolution; adaptable to other support communications use cases.
CloudML Deep Collaborative Filtering - Recommend songs given either a user or song.
CloudML Energy Price Forecasting - Predicting the future energy price based on historical price and weather.
CloudML Fraud Detection - Fraud detection model for credit-cards transactions.
CloudML Scikit-learn Pipeline - This is a example for building a scikit-learn-based machine learning pipeline trainer that can be run on AI Platform. The pipeline can be trained locally or remotely on AI platform. The trained model can be further deployed on AI platform to serve online traffic.
CloudML Sentiment Analysis - Sentiment analysis for movie reviews using TensorFlow RNNEstimator.
CloudML TensorFlow Profiling - TensorFlow profiling examples for training models with CloudML
Data Generator - Generate random data with a custom schema at scale for integration tests or demos.
Dataflow BigQuery Transpose Example - An example pipeline to transpose/pivot/rotate a BigQuery table.
Dataflow Custom Templates Example - An example that demonstrates how to build custom Dataflow templates.
Dataflow Elasticsearch Indexer - An example pipeline that demonstrates the process of reading JSON documents from Cloud Pub/Sub, enhancing the document using metadata stored in Cloud Bigtable and indexing those documents into Elasticsearch.
Dataflow BigQuery to AlloyDB - Example that shows how to move data from BigQuery to an AlloyDB table using Dataflow.
Dataflow Flex Template in Restricted Networking Env - Example implements a python flex template which can be run in an environment where workers can not download python packages due to egress traffic restrictions.
Dataflow Python Examples - Various ETL examples using the Dataflow Python SDK.
Dataflow Scala Example: Kafka2Avro - Example to read objects from Kafka, and persist them encoded in Avro in Google Cloud Storage, using Dataflow with SCIO.
Dataflow Streaming Benchmark - Utility to publish randomized fake JSON messages to a Cloud Pub/Sub topic at a configured QPS.
Dataflow Streaming Schema Changes Handler - Dataflow example to handle schema changes using schema enforcement and DLT approach
Dataflow Streaming XML to GCS - Dataflow example to handle streaming of xml encoded messages and write them to Google Cloud Storage
Dataflow DLP Hashpipeline - Match DLP Social Security Number findings against a hashed dictionary in Firestore. Use Secret Manager for the hash key.
Dataflow Template Pipelines - Pre-implemented Dataflow template pipelines for solving common data tasks on Google Cloud Platform.
Dataflow Production Ready - Reference implementation for best practices around Beam, pipeline structuring, testing and continuous deployment.
Dataflow XML to BigQuery - Example of loading XML data into BigQuery with DataFlow via XMLIO.
Dataproc GCS Connector - Install and test unreleased features on the GCS Connector for Dataproc.
Dataproc Job Optimization Guide - Step-by-step guide for optimizing a sample Dataproc Job.
Dataproc Persistent History Server for Ephemeral Clusters - Example of writing logs from an ephemeral cluster to GCS and using a separate single node cluster to look at Spark and YARN History UIs.
Dataproc Lifecycle Management via Composer - Ephemeral Dataproc lifecycle management and resources optimization via Composer, Terraform template to deploy Composer and additional reqs, Dynamically generated DAGs from jobs config files.
Dataproc Running Notebooks - Orchestrating the workflow of running Jupyter Notebooks on a Dataproc cluser via PySpark job
dbt-on-cloud-composer - Example of using dbt to manage BigQuery data pipelines, utilizing Cloud Composer to run and schedule the dbt runs.
Data Format Description Language (DFDL) Processesor with Firestore and Pubsub - Example to process a binary using DFDL definition and Daffodil libraries. The DFDL definition is stored in firestore, the request to process is done through a pubsub subcription and the output is published is a JSON format in a Pubsub topic.
Data Format Description Language (DFDL) Processesor with Bigtable and Pubsub - Example to process a binary using DFDL definition and Daffodil libraries. The DFDL definition is stored in bigtable, the request to process is done through a pubsub subcription and the output is published is a JSON format in a Pubsub topic.
Dialogflow Webhook Example - Webhook example for dialogflow in Python.
Dialogflow CX Private Webhook Example - Webhook example for Dialogflow CX in Python.
Dialogflow Middleware Example - Dialogflow middleware example in Java.
Dialogflow Entities Creation and Update - Creation and update of entities for Dialogflow in Python.
DLP API Examples - Examples of the DLP API usage.
Ephemeral Projects - Creating short lived gcp projects for sandbox purposes.
GCE Access to Google AdminSDK - Example to help manage access to Google's AdminSDK using GCE's service account identity
GCS Hive External Table File Optimization - Example solution to showcase impact of file count, file size, and file type on Hive external tables and query speeds.
GCS to BQ using serverless services - Example to ingest GCS to BigQuery using serverless services such as Cloud Function, Pub/Sub and Serverless Spark.
GDCE Terraform Example - Example for provisioning GDCE resources using terraform.
GKE HA setup using spot VMs - Example for running an application with high availability requirements on GKE spot nodes using on-demand nodes as fallback
Grpc Server connected to Spanner Database - Basic example of a Grpc server that is connected to a Spanner database.
Grpc Server connected to Redis - Basic example of a Grpc server that is connected to Redis.
Gitlab KAS agent for GKE - Terraform solution for deploying a Gitlab KAS agent for synchronizing container deployments from Gitlab repos into a GKE cluster
Home Appliance Status Monitoring from Smart Power Readings - An end-to-end demo system featuring a suite of Google Cloud Platform products such as IoT Core, ML Engine, BigQuery, etc.
IAP User Profile - An example to retrieve user profile from an IAP-enabled GAE application.
IoT Nirvana - An end-to-end Internet of Things architecture running on Google Cloud Platform.
Kubeflow Pipelines Sentiment Analysis - Create a Kubeflow Pipelines component and pipelines to analyze sentiment for New York Times front page headlines using Cloud Dataflow (Apache Beam Java) and Cloud Natural Language API.
Kubeflow Fairing Example - Provided three notebooks to demonstrate the usage of Kubeflow Faring to train machine learning jobs (Scikit-Learn, XGBoost, Tensorflow) locally or in the Cloud (AI platform training or Kubeflow cluster).
Left-Shift Validation Pre-Commit Hook - An example that uses a set of Bash scripts to set up a pre-commit hook that validates Kubernetes resources with Gatekeeper constraints and constraint templates from your choice of sources.
LookerStudio Cost Optimization Dashboard - SQL scripts to help build Cost Optimization LookerStudio Dashboard.
Personal Workbench Notebooks Deployer - Terraform sample modules to provision Dataproc Hub using personal auth clusters, and workbench managed notebooks for individual analytical users.
Project factory with Terragrunt - This implements a State-Scalable project factory pattern for creating Google Cloud Platform projects using Terragrunt and public Terraform modules
Python CI/CD with Cloud Builder and CSR - Example that uses Cloud Builder and Cloud Source Repositories to automate testing and linting.
Pub/Sub Client Batching Example - Batching in Pub/Sub's Java client API.
QAOA - Examples of parsing a max-SAT problem in a proprietary format, for Quantum Approximate Optimization Algorithm (QAOA)
Redis Cluster on GKE Example - Deploying Redis cluster on GKE.
Spanner Interleave Subquery - Example code to benchmark Cloud Spanner's subqueries for interleaved tables.
Spanner Change Stream to BigQuery using Dataflow - Terraform code to deploy Spanner change stream and publish changes to BigQuery using Dataflow Streaming Job.
Spinnaker - Example pipelines for a Canary / Production deployment process.
STS Metrics from STS Notification - Example code to generate custom metrics from STS notification.
TensorFlow Serving on GKE and Load Testing - Examples how to implement Tensorflow model inference on GKE and to perform a load testing of such solution.
TensorFlow Unit Testing - Examples how to write unit tests for TensorFlow ML models.
Terraform Internal HTTP Load Balancer - Terraform example showing how to deploy an internal HTTP load balancer.
Terraform NetApp CVS - This example shows how to deploy NetApp CVS volumes using terraform.
Terraform Resource Change Policy Library - Contains a library of policies written in the OPA Constraint Framework format to be used by gcloud beta terraform vet to validate Terraform resource changes in a CI/CD pipeline.
Uploading files directly to Google Cloud Storage by using Signed URL - Example architecture to enable uploading files directly to GCS by using Signed URL.
TSOP object transfer Log prosessor - This example shows how to log object transfer logs by TSOP to Cloud Logging.
GCS CSV files to BigQuery - This example shows how to load files in CSV format stored in GCS to load to BigQuery tables. The files can be uncompressed or be compressed in formats such as Bzip2, GZIP and etc. See https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/Compression.html for the list of support compression method.

Tools

The tools folder contains ready-made utilities which can simplify Google Cloud Platform usage.

Agile Machine Learning API - A web application which provides the ability to train and deploy ML models on Google Cloud Machine Learning Engine, and visualize the predicted results using LIME through simple post request.
Airflow DAG Metadata Generator - Use Google's generative models to analyze Airflow DAGs and supplement them with generated description, tags, and doc_md values.
Airflow States Collector - A tool that creates and uploads an airflow dag to the dags GCS folder. The dag incrementally collect airflow task states and stores to BQ. It also autogenerates a LookerStudio dashboard querying the BQ view.
Airpiler - A python script to convert Autosys JIL files to dag-factory format to be executed in Cloud Composer (managed airflow environment).
Ansible Module for Anthos on Bare Metal - Ansible module for installation of Anthos on Bare Metal
Anthos Bare Metal Installer - An ansible playbook that can be used to install Anthos Bare Metal.
Apache Beam Client Throttling - A library that can be used to limit the number of requests from an Apache Beam pipeline to an external service. It buffers requests to not overload the external service and activates client-side throttling when the service starts rejecting requests due to out of quota errors.
API Key Rotation Checker - A tool that checks your GCP organization for API keys and compares them to a customizable rotation period. Regularly rotating API keys is a Google and industry standard recommended best practice.
AssetInventory - Import Cloud Asset Inventory resourcs into BigQuery.
BigQuery Discount Per-Project Attribution - A tool that automates the generation of a BigQuery table that uses existing exported billing data, by attributing both CUD and SUD charges on a per-project basis.
BigQuery Policy Tag Utility - Utility class for tagging BQ Table Schemas with Data Catalog Taxonomy Policy Tags. Create BQ Authorized Views using Policy Tags. Helper utility to provision BigQuery Dataset, Data Catalog Taxonomy and Policy Tags.
BigQuery Query Plan Exporter - Command line utility for exporting BigQuery query plans in a given date range.
BigQuery Query Plan Visualizer - A web application which provides the ability to visualise the execution stages of BigQuery query plans to aid in the optimization of queries.
BigQuery z/OS Mainframe Connector - A utility used to load COBOL MVS data sets into BigQuery and execute query and load jobs from the IBM z/OS Mainframe.
Boolean Organization Policy Enforcer - A tool to find the projects that do not set a boolean organization policy to its expected state, subsequently, set the organization policy to its expected set.
Capacity Planner CLI - A stand-alone tool to extract peak resource usage values and corresponding timestamps for a given GCP project, time range and timezone.
Capacity Planner Sheets Extension - A Google Sheets extension to extract peak resource usage values and corresponding timestamps for a given GCP project, time range and timezone.
CloudConnect - A package that automates the setup of dual VPN tunnels between AWS and GCP.
Cloudera Parcel GCS Connector - This script helps you create a Cloudera parcel that includes Google Cloud Storage connector. The parcel can be deployed on a Cloudera managed cluster. This script helps you create a Cloudera parcel that includes Google Cloud Storage connector. The parcel can be deployed on a Cloudera managed cluster.
Cloud AI Vision Utilities - This is an installable Python package that provides support tools for Cloud AI Vision. Currently there are a few scripts for generating an AutoML Vision dataset CSV file from either raw images or image annotation files in PASCAL VOC format.
Cloud Composer Backup and Recovery - A command line tool for applying backup and recovery operations on Cloud Composer Airflow environments.
Cloud Composer DAG Validation - An automated process for running validation and testing against DAGs in Composer.
Cloud Composer Migration Complexity Assessment - An Airflow DAG that uses a variety of tools to analyze a Cloud Composer 1 environment, determine a work estimate, and accelerate the conversion of airflow 1 dags to airflow 2 dags.
Cloud Composer Migration Terraform Generator - Analyzes an existing Cloud Composer 1 / Airflow 1 environment and generates terraform. Configures new Cloud Composer 2 environment to meet your workload demands.
CUD Prioritized Attribution - A tool that allows GCP customers who purchased Committed Use Discounts (CUDs) to prioritize a specific scope (e.g. project or folder) to attribute CUDs first before letting any unconsumed discount float to other parts of an organization.
Custom Role Analyzer - This tool will provide useful insights with respect to custom roles at organization level as well as project level to find predefined roles from which the custom role is built.
Custom Role Manager - Manages organization- or project-level custom roles by combining predefined roles and including and removing permissions with wildcards. Can run as Cloud Function or output Terraform resources.
Dataproc Event Driven Spark Recommendations - Use Google Cloud Functions to analyze Cloud Dataproc clusters and recommend best practices for Apache Spark jobs. Also logs cluster configurations for future reference.
Dataproc Scheduled Cluster Sizing - Use Google Cloud Scheduler an Google Cloud Functions to schedule the resizing of a Dataproc cluster. Changes the primary and secondary worker count.
DataStream Deployment Automation - Python script to automate the deployment of Google Cloud DataStream. This script will create connection profiles, create stream and start stream.
DLP to Data Catalog - Inspect your tables using Data Loss Prevention for PII data and automatically tag it on Data Catalog using Python.
DNS Sync - Sync a Cloud DNS zone with GCE resources. Instances and load balancers are added to the cloud DNS zone as they start from compute_engine_activity log events sent from a pub/sub push subscription. Can sync multiple projects to a single Cloud DNS zone.
Firewall Enforcer - Automatically watch & remove illegal firewall rules across organization. Firewall rules are monitored by a Cloud Asset Inventory Feed, which trigger a Cloud Function that inspects the firewall rule and deletes it if it fails a test.
GCE Disk Encryption Converter - A tool that converts disks attached to a GCE VM instance from Google-managed keys to a customer-managed key stored in Cloud KMS.
GCE switch disk-type - A tool that changes type of disks attached to a GCE instance.
GCE Quota Sync - A tool that fetches resource quota usage from the GCE API and synchronizes it to Stackdriver as a custom metric, where it can be used to define automated alerts.
GCE Usage Log - Collect GCE instance events into a BigQuery dataset, surfacing your vCPUs, RAM, and Persistent Disk, sliced by project, zone, and labels.
GCP Architecture Visualizer - A tool that takes CSV output from a Forseti Inventory scan and draws out a dynamic hierarchical tree diagram of org -> folders -> projects -> gcp_resources using the D3.js javascript library.
GCP AWS HA VPN Connection terraform - Terraform script to setup HA VPN between GCP and AWS.
GCP Azure HA VPN Connection Terraform - Terraform code to setup HA VPN between GCP and Microsoft Azure.
GCP Organization Hierarchy Viewer - A CLI utility for visualizing your organization hierarchy in the terminal.
GCPViz - a visualization tool that takes input from Cloud Asset Inventory, creates relationships between assets and outputs a format compatible with graphviz.
GCS Bucket Mover - A tool to move user's bucket, including objects, metadata, and ACL, from one project to another.
GCS to BigQuery - A tool fetches object metadata from all Google Cloud Storage buckets and exports it in a format that can be imported into BigQuery for further analysis.
GCS Usage Recommender - A tool that generates bucket-level intelligence and access patterns across all projects for a GCP project to generate recommended object lifecycle management.
GCVE2BQ - A tool for scheduled exports of VM, datastore and ESXi utilization data from vCenter to BigQuery for billing and reporting use cases.
GKE AutoPSC Controller - Google Kubernetes Engine controller, to setup PSC ServiceAttachment for Gateway API managed Forwarding Rules.
Global DNS -> Zonal DNS Project Bulk Migration - A shell script for gDNS-zDNS project bulk migration.
GKE Billing Export - Google Kubernetes Engine fine grained billing export.
gmon - A command-line interface (CLI) for Cloud Monitoring written in Python.
Google Cloud Support Slackbot - Slack application that pulls Google Cloud support case information via the Cloud Support API and pushes the information to Slack
GSuite Exporter Cloud Function - A script that deploys a Cloud Function and Cloud Scheduler job that executes the GSuite Exporter tool automatically on a cadence.
GSuite Exporter - A Python package that automates syncing Admin SDK APIs activity reports to a GCP destination. The module takes entries from the chosen Admin SDK API, converts them into the appropriate format for the destination, and exports them to a destination (e.g: Stackdriver Logging).
Hive to BigQuery - A Python framework to migrate Hive table to BigQuery using Cloud SQL to keep track of the migration progress.
IAM Permissions Copier - This tool allows you to copy supported GCP IAM permissions from unmanaged users to managed Cloud Identity users.
IAM Recommender at Scale - A python package that automates applying iam recommendations.
Instance Mapper - Maps different IaaS VM instance types from EC2 and Azure Compute to Google Cloud Platform instance types using a customizable score-based method. Also supports database instances.
IPAM Autopilot - A simple tool for managing IP address ranges for GCP subnets.
K8S-2-GSM - A containerized golang app to migrate Kubernetes secrets to Google Secrets Manger (to leverage CSI secret driver). LabelMaker - A tool that reads key:value pairs from a json file and labels the running instance and all attached drives accordingly.
Logbucket Global to Regional - Utility to change _Default sink destination to regional log buckets
Machine Learning Auto Exploratory Data Analysis and Feature Recommendation - A tool to perform comprehensive auto EDA, based on which feature recommendations are made, and a summary report will be generated.
Maven Archetype Dataflow - A maven archetype which bootstraps a Dataflow project with common plugins pre-configured to help maintain high code quality.
Netblock Monitor - An Apps Script project that will automatically provide email notifications when changes are made to Google’s IP ranges.
OpenAPI to Cloud Armor converter - A simple tool to generate Cloud Armor policies from OpenAPI specifications.
Permission Discrepancy Finder - A tool to find the principals with missing permissions on a resource within a project, subsequently, grants them the missing permissions.
Pubsub2Inbox - A generic Cloud Function-based tool that takes input from Pub/Sub messages and turns them into email, webhooks or GCS objects.
Quota Manager - A python module to programmatically update GCP service quotas such as bigquery.googleapis.com.
Quota Monitoring and Alerting - An easy-to-deploy Data Studio Dashboard with alerting capabilities, showing usage and quota limits in an organization or folder.
Ranger Hive Assessment for BigQuery/BigLake IAM migration - A tool that assesses which Ranger authorization rules can be migrated or not to BigQuery/BigLake IAM.
Reddit Comment Streaming - Use PRAW, TextBlob, and Google Python API to collect and analyze reddit comments. Pushes comments to a Google Pub/sub Topic.
Secret Manager Helper - A Java library to make it easy to replace placeholder strings with Secret Manager secret payloads.
Service Account Provider - A tool to exchange GitLab CI JWT tokens against GCP IAM access tokens, in order to allow GitLab CI jobs to access Google Cloud APIs
Site Verification Group Sync - A tool to provision "verified owner" permissions (to create GCS buckets with custom dns) based on membership of a Google Group.
SLO Generator - A Python package that automates computation of Service Level Objectives, Error Budgets and Burn Rates on GCP, and export the computation results to available exporters (e.g: PubSub, BigQuery, Stackdriver Monitoring), using policies written in JSON format.
Snowflake_to_BQ - A shell script to transfer tables (schema & data) from Snowflake to BigQuery.
SPIFFE GCP Proxy - A tool to ease the integration of SPIFFE supported On-Prem workloads with GCP APIs using Workload Identity Federation
STS Job Manager - A petabyte-scale bucket migration tool utilizing Storage Transfer Service
VM Migrator - This utility automates migrating Virtual Machine instances within GCP. You can migrate VM's from one zone to another zone/region within the same project or different projects while retaining all the original VM properties like disks, network interfaces, ip, metadata, network tags and much more.
VPC Flow Logs Analysis - A configurable Log sink + BigQuery report that shows traffic attributed to the projects in the Shared VPCs.
VPC Flow Logs Enforcer - A Cloud Function that will automatically enable VPC Flow Logs when a subnet is created or modified in any project under a particular folder or folders.
VPC Flow Logs Top Talkers - A configurable Log sink + BigQuery view to generate monthly/daily aggregate traffic reports per subnet or host, with the configurable labelling of IP ranges and ports.
Webhook Ingestion Data Pipeline - A deployable app to accept and ingest unauthenticated webhook data to BigQuery.
XSD to BigQuery Schema Generator - A command line tool for converting an XSD schema representing deeply nested and repeated XML content into a BigQuery compatible table schema represented in JSON.
Numeric Family Recommender - Oracle - The Numeric Family Recommender is a database script that recommends the best numeric data type for the NUMBER data type when migrating from legacy databases like Oracle to Google Cloud platforms like BigQuery, AlloyDB, Cloud SQL for PostgreSQL, and Google Cloud Storage.
Composer DAG Load Generator - This is an automatic DAG generator tool which can be used to create test workload on a cloud composer environmnet and to test differents airflows configurations or to do fine tune using the composer/airflow metrics.
Gradio and Generative AI Example - The example code allows developers to create rapid Generative AI PoC applications with Gradio and Gen AI agents.
Memorystore Cluster Ops Framework - This is a framework that provides the tools to apply cluster level operations that enable capabilities like cluster backups, migration & validation, etc. The framework can be extended for other use cases as required. The framework uses RIOT to bridge current product gaps with Memorystore Clusters
ML Project Generator - A utility to create a Production grade ML project template with the best productivity tools installed like auto-formatting, license checks, linting, etc.

Contributing

See the contributing instructions to get started contributing.

Contact

Questions, issues, and comments should be directed to [email protected].

professional-services's People

Contributors

Stargazers

Watchers

Forkers

mzuo sattalla morgante pombredanne smeyn joshmcginley sabhyankar rasmi mseravalli reechar-goog erikwebb omerhabas alan-krumholz arthurarg kevensen kjwinters dan-anghel davidyan74 firefoxxy8 shubhamshubhankar sarvex kardiff18 ed00m basiafusinska michaelwsherman sudsk mandalorian007 talihbayram ptone ryanmcdowell svsamipillai jamesduke mdiby atqnp thegiive kfjatuch mzinni iht jezhus btowner01 ssakhuja57 rangastartup mbrukman shirleycohen apollo-3 lkuligin ibenrodriguez gordon-scalesec bipinupd pkattamuri mikeroyal freedomofnet cardsec cenbiliu jimmyready pttsui williamn haaris292 utzwestermann coolpalani greg285 sosimon jmound brandonjbjelland dirtscraper marfenij mihir25 eshen1991 goungoun xia7410 kelsey01 skko29 neerajbhadani draco2003 pkdhingra hashkanna mileworks lerrytang thefirstofthe300 manojvsj matthewmarr maruthiprithivi abhishek-bi jeffmccune aslammc jampani80 mkjmkumar pupamanyu bhanditz ipv1337 sangrealest cindybai magnusatikea dalavancloud ssudhindra-zz ocervell shunjikawabata seymen technet383 sougata-c

professional-services's Issues

Export to csv / json

Hi.
Is it possible to export Google Drive Activiy (i.e. file downloads) to a csv / json file filtering by a date range?

Live example link broken

Can't access this URL:
https://storage.googleapis.com/strike3-gcp-arch-viz/gcp-arch-viz.html

Originally found in this public article as well:
https://medium.com/google-cloud/visualize-gcp-architecture-using-forseti-2-0-and-d3-js-ffc8fdf59450

Seems like great work. Thanks for sharing.

gke redis-cluster deployment error.

Hi.

Thanks for the article about gke cluster-redis, is very interesting.

I realize the configration and when y tray to test it, i have the next error.

kubectl get pods -l app=redis,redis-type=cache -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READI
NESS GATES
redis-cache-64f4d98457-7lph9 1/1 Running 0 19m 10.0.1.34 gke-dev-ironia-pruebas-d1d66cb9-d9q7 <none

redis-cache-64f4d98457-bkwx4 1/1 Running 0 6m58s 10.0.1.27 gke-dev-ironia-default-pool-1f34a49a-nc7b <none

avalencia@cloudshell:~/redis/professional-services/examples/redis-cluster-gke$ kubectl run -it redis-cli --image=redis --restart=Never /bin/bash
If you don't see a command prompt, try pressing enter.
root@redis-cli:/data# redis-cli -c -p 6379 -h 10.0.1.34
10.0.1.34:6379> set foo bar
(error) CLUSTERDOWN Hash slot not served
10.0.1.34:6379>

Any idea about the problem?

Thanks in advance

Missing Dataflow python code - data_generation_for_benchmarking.py

Hi - it looks like the file data_generation_for_benchmarking.py has been removed from the current master? This was previously available in "professional-services/examples/dataflow-python-examples/dataflow_python_examples"

I use this program in a datalflow example so if you could indicate if there is an alternative I can use - or whether I should even continue to use this example given the code has been deleted?

thanks

Fix vulnerabilities in bq-visualizer

gts init -y returns the following for bq-visualizer:

audited 43161 packages in 12.274s
found 616 vulnerabilities (1 low, 1 moderate, 612 high, 2 critical)
run npm audit fix to fix them, or npm audit for details

Please update your packages to fix the vulnerabilities.

Cloudera GCS connector parcel - DistCp guide

Hi! This is just a suggestion, not really an issue -- please feel free to close this if this is not the right place to raise this.

TL;DR: it would be nice to show in the guide how to make the connector work for DistCp.

When we follow the guide, there is a issue that always pops up every time we do a DistCp transfer:

Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found

Which results from the HADOOP_CLASSPATH not being updated for all the worker nodes.

The README shows the following snippets for testing:

$ hdfs dfs -ls gs://bucket_name

However, we found out that this is not a good enough test for DistCp, since the -ls command does not need the HADOOP_CLASSPATH to be set in other worker nodes.

It would be very useful to have the Cloudera environment settings for DistCp in the guide, so it would be much more painless to onboard data engineers into GCP.

Thanks!

Asset Inventory tool : Installation issues (json & missing dependencies)

Hi guys

I ran into some issues during your installation instructions for Asset Inventory tool :

in the config.yaml file there is the key "import_pipeline_runtime_environment" with not valid json value. After I fixed it the issue has gone.
there is missing dependency in your requirements.txt file - "requests-futures"

Thx
Evgeny

gce-to-adminsdk generates a lot of errors despite working

Is it normal for gce-to-adminsdk to log a bunch of errors while running in CloudFunctions? It works (makes authenticated API calls to admin sdk APIs), but every time it runs produces a good 20 entries of errors to Stackdriver logging.

E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 ImportError: file_cache is unavailable when using oauth2client >= 4.0.0 or google-auth activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044     'file_cache is unavailable when using oauth2client >= 4.0.0 or google-auth') activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044   File "/env/local/lib/python3.7/site-packages/googleapiclient/discovery_cache/file_cache.py", line 41, in <module> activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044     from . import file_cache activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044   File "/env/local/lib/python3.7/site-packages/googleapiclient/discovery_cache/__init__.py", line 41, in autodetect activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 Traceback (most recent call last): activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 {"insertId":"000021-84c0778b-e6aa-4ddb-9877-3124648fb453","resource":{"type":"cloud_function","labels":{"function_name":"activities_list_to_json","project_id":"pg-gx-n-app-934447","region":"us-central1"}},"timestamp":"2019-08-23T13:50:20.535Z","severity":"ERROR","labels":{"execution_id":"71596292954… activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 During handling of the above exception, another exception occurred: activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 {"insertId":"000019-84c0778b-e6aa-4ddb-9877-3124648fb453","resource":{"type":"cloud_function","labels":{"project_id":"pg-gx-n-app-934447","region":"us-central1","function_name":"activities_list_to_json"}},"timestamp":"2019-08-23T13:50:20.535Z","severity":"ERROR","labels":{"execution_id":"71596292954… activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 ModuleNotFoundError: No module named 'oauth2client' activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044     from oauth2client.locked_file import LockedFile activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044   File "/env/local/lib/python3.7/site-packages/googleapiclient/discovery_cache/file_cache.py", line 37, in <module> activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 Traceback (most recent call last): activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 {"insertId":"000014-84c0778b-e6aa-4ddb-9877-3124648fb453","resource":{"type":"cloud_function","labels":{"project_id":"pg-gx-n-app-934447","region":"us-central1","function_name":"activities_list_to_json"}},"timestamp":"2019-08-23T13:50:20.535Z","severity":"ERROR","labels":{"execution_id":"71596292954… activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 During handling of the above exception, another exception occurred: activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 {"insertId":"000012-84c0778b-e6aa-4ddb-9877-3124648fb453","resource":{"type":"cloud_function","labels":{"project_id":"pg-gx-n-app-934447","region":"us-central1","function_name":"activities_list_to_json"}},"timestamp":"2019-08-23T13:50:20.535Z","severity":"ERROR","labels":{"execution_id":"71596292954… activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 ModuleNotFoundError: No module named 'oauth2client' activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044     from oauth2client.contrib.locked_file import LockedFile activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044   File "/env/local/lib/python3.7/site-packages/googleapiclient/discovery_cache/file_cache.py", line 33, in <module> activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 Traceback (most recent call last): activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 {"insertId":"000007-84c0778b-e6aa-4ddb-9877-3124648fb453","resource":{"type":"cloud_function","labels":{"project_id":"pg-gx-n-app-934447","region":"us-central1","function_name":"activities_list_to_json"}},"timestamp":"2019-08-23T13:50:20.535Z","severity":"ERROR","labels":{"execution_id":"71596292954… activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 During handling of the above exception, another exception occurred: activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 {"insertId":"000005-84c0778b-e6aa-4ddb-9877-3124648fb453","resource":{"type":"cloud_function","labels":{"project_id":"pg-gx-n-app-934447","region":"us-central1","function_name":"activities_list_to_json"}},"timestamp":"2019-08-23T13:50:20.535Z","severity":"ERROR","labels":{"execution_id":"71596292954… activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 ModuleNotFoundError: No module named 'google.appengine' activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044     from google.appengine.api import memcache activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044   File "/env/local/lib/python3.7/site-packages/googleapiclient/discovery_cache/__init__.py", line 36, in autodetect activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 Traceback (most recent call last): activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 file_cache is unavailable when using oauth2client >= 4.0.0 or google-auth activities_list_to_json 715962929541044

Thoughts?

Cloudera GCS connector parcel script does not copy directory files

When running the script, files in the same directory (e.g. the JSON key) are not copied over to the parcel folder.

In the following script:
professional-services/tools/cloudera-parcel-gcsconnector/create_parcel.sh

  # Download gcs connector jar and copy all folder content to parcel location
  # Set flag for further parcel file existence validation
  curl -o gcs-connector-hadoop2-latest.jar --fail ${GCSJAR_LINK} || GCSJAR_FLAG="0"

  # Validate if package downloaded properly
  if [[ ${GCSJAR_FLAG} = "0" ]]; then
    echo "Error: hadoop connector failed to download, check network connectivity or file/folder permissions"
    graceful_exit
  fi

  cp gcs-connector-hadoop2-latest.jar ${PARCEL_FULLNAME}/lib/hadoop/lib/

This part of the script only copies the jar file, but not other files in the folder.

I noticed that the correct code was removed in this commit, specifically:

##Download gcs connector jar and copy all folder content to parcel location
wget https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-hadoop2-latest.jar
if [[ $? -ne 0 ]]; then
   echo "Download GCS connector failed!"
   exit 1
fi

cp * ${filen^^}-$version/lib/hadoop/lib/

## Create parcel.json file required for parcel packaging
cat >>${filen^^}-$version/meta/parcel.json<< EOL

Asset inventory - update to python 3 and RFI

Hello
the dataflow pipeline seems to be in python 2.7 which is deprecated. Is it possible to update it to python 3, please?
In your code, you're linking a dict that is not accessible ( # https://colab.corp.google.com/drive/183eM8uKO0V-HD2ldx7AMJ8H6UQnh3JHt). Is it possible to provide it, please?
Thanks a lot for these tools which is very useful for us.
Have a nice day
best regards

dataflow-python-examples have to be updated to Python 3

examples/dataflow-python-examples have to be updated to Python 3

passing num_shard parameter and enabling Dataflow Shuffler with command line approach

Can you please update the documentation to pass num_shard parameter and enable dataflow shuffler using command line approach.

passing the parameter num_shards: *=1,resource=100,google.cloud.bigquery.Table=100 in
gcloud dataflow jobs run command

Asset inventory issue

Running into duplicate key entries at the Big Query level when utilizing the asset inventory app engine via dataflow template.

`[root@jump01 asset_inventory]# bq show -j
Job project1:

Job Type State Start Time Duration User Email Bytes Processed Bytes Billed Billing Tier Labels

load FAILURE 16 Sep 18:57:16 0:00:00.633000 [email protected]

Error encountered during job execution:
Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.
Failure details:

Error while reading data, error message: JSON processing
encountered too many errors, giving up. Rows: 1; errors: 1; max
bad: 0; error percent: 0
gs://-assets/stage/2019-09-16T22:48:25.550864/compute.g
oogleapis.com/GlobalForwardingRule.0.json: Error while reading
data, error message: JSON parsing error in row starting at position
0: Multiple definition of field: resource.data.ipProtocol
`

gke-billing-export can't scrape pod data after one hour anymore

We deployed gke-billing-export on one of our GKE clusters to gather data from six clusters running in three different projects.
It is running with its own service account like stated in the readme, permissions were granted like described.
The BigQuery dataset has not been created in any of those projects, it's placed in its own project.

Our setup looks like the following:
project1: cluster1
project2: cluster2
project3: cluster3-6
The used Kubernetes versions differ slightly: Cluster1 and 2 are running on v1.11.5-gke.5, clusters 3-6 are running partially on v1.11.2-gke.25 and v1.11.4-gke.12

During startup, the app discovers all clusters in the three projects, scrapes data from the Kubernetes Master API and writes is successfully into BigQuery.
After one hour, it cannot query the K8s Master API anymore and it does never recover from that. In the log, we're seeing the following (I replaced the names of projects and clusters with generic ones) :

2019/01/14 16:52:05 Fetching a list of all clusters
2019/01/14 16:52:06   project1/cluster1
2019/01/14 16:52:06   project2/cluster2
2019/01/14 16:52:07   project3/cluster3
2019/01/14 16:52:07   project3/cluster4
2019/01/14 16:52:07   project3/cluster5
2019/01/14 16:52:07   project3/cluster6
2019/01/14 16:52:07 Sent 122 rows to bigquery for project "project1" cluster "cluster1" in 654.644221ms
2019/01/14 16:52:07 Sent 211 rows to bigquery for project "project2" cluster "cluster2" in 659.385595ms
2019/01/14 16:52:07 Sent 20 rows to bigquery for project "project3" cluster "cluster4" in 671.863548ms
2019/01/14 16:52:08 Sent 348 rows to bigquery for project "project3" cluster "cluster3" in 1.195654306s
2019/01/14 16:52:08 Sent 291 rows to bigquery for project "project3" cluster "cluster5" in 1.709932261s
2019/01/14 16:52:10 Sent 259 rows to bigquery for project "project3" cluster "cluster6" in 3.468579094s
[...]
2019/01/14 17:52:07 Error fetching pods for project3/cluster3: Unauthorized
2019/01/14 17:52:07 Error fetching pods for project3/cluster4: Unauthorized
2019/01/14 17:52:07 Error fetching pods for project2/cluster2: Unauthorized
2019/01/14 17:52:07 Error fetching pods for project1/cluster1: Unauthorized
2019/01/14 17:52:07 Error fetching pods for project3/cluster5: Unauthorized
2019/01/14 17:52:07 Error fetching pods for project3/cluster6: Unauthorized

We tested different intervals (from 60s to 600s) for scraping and can reproduce the behaviour every time. It works for exactly one hour and then we only see the "Unauthorized" log messages.

Any help what/where to check would be appreciated.

iot-nirvana: Build the solution - Error

The following error was received after using "mvn clean install"
`-------------------------------------------------------
T E S T S

Error: Could not find or load main class org.apache.maven.surefire.booter.ForkedBooter

Results :

Tests run: 0, Failures: 0, Errors: 0, Skipped: 0

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for IoT Nirvana 0.1-SNAPSHOT:
[INFO]
[INFO] IoT Nirvana ........................................ SUCCESS [ 0.810 s]
[INFO] IoT Nirvana Common ................................. FAILURE [ 6.042 s]
[INFO] IoT Nirvana Frontend ............................... SKIPPED
[INFO] IoT Nirvana Client ................................. SKIPPED
[INFO] IoT Nirvana Pipeline ............................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 7.324 s
[INFO] Finished at: 2019-01-26T13:40:21Z
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12.4:test (default-test) on project google-cloud-demo-iot-nirvana-common: Execution default-test of goal org.apache.maven.plugins:maven-surefire-plugin:2.12.4:test failed: The forked VM terminated without saying properly goodbye. VM crash or System.exit called ? -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :google-cloud-demo-iot-nirvana-common`

bq-visualizer: fix building process / improve how to build the code docs.

While building the code for bq-visualizer:

npm install
ng build

and I get the following error

ERROR in src/app/app-routing.module.ts(18,29): error TS2307: Cannot find module './main/main.component'.
src/app/app-routing.module.ts(19,30): error TS2307: Cannot find module './terms/terms.component'.
src/app/app.module.ts(44,29): error TS2307: Cannot find module './main/main.component'.
src/app/app.module.ts(50,30): error TS2307: Cannot find module './terms/terms.component'.

I am not sure if the code is broken or I am not building correctly if it is the former would be nice to have a better example of how to build the app in the docs.

https://github.com/GoogleCloudPlatform/professional-services/blob/master/tools/bq-visualizer/README.md

Unable to generate and publish mqtt

I'd like to offer a feedback I did not have to other users.

As I've run it in Python 3, I had to explicitly convert the iterator into a sequence (list, tuple, etc) to allow unpack it multiple times

Hi @lerrytang ,
I'm unable to generate/publish mqtt payload when running locally EnergyDisaggregationDemo_Client.ipynb even though no error message is produced.

Would you help me please? What can I do to make it work?

http://localhost:8890/notebooks/EnergyDisaggregationDemo_Client.ipynb#

before data trimming: data.shape=(432000,)
after data trimming: data.shape=(131820,)
Creating JWT using RS256 from private key file rs256.key
connected: Connection Accepted.
connected: Connection Accepted.
connected: Connection Accepted.
...

LOG_EXPORT_NAME for DLP & Cloud Function example

The variable for LOG_EXPORT_NAME is not defined in the readme which causes the Log export command to fail.

https://github.com/GoogleCloudPlatform/professional-services/tree/master/examples/dlp/cloud_function_example

GCS Bucket Mover - bucket policy only

Hi,

I love the GCS Bucket Mover tool - but would it be possible to use it in combination with the bucket policy only feature? It looks like it now really depends on ACLs to exist or at least be accessible, which isn't the case when bucket policy only is activated.

Thanks,

Wietse

Verify / Update all python assets for Python3 Compatibility

This issue is to track updates to existing assets to keep them from going stale in coming python2 deprecatation.

Getting list of python assets

find ./professional-services -name *.py -type f | awk '{ FS = "/"}; {print $3 "." $4}' | uniq

Broken visualisation in Bigtable Dataflow Cryptocurrencies Exchange RealTime Example

Links to external resources in streamingvisjs.html are broken.

<script src="http://visjs.org/dist/vis.js"></script>
<link href="http://visjs.org/dist/vis-timeline-graph2d.min.css" rel="stylesheet" type="text/css" />

Error: Could not find or load main class org.apache.maven.surefire.booter.ForkedBooter

While bulding the iot-nirvana demo I stumbled upon this issue:
Error: Could not find or load main class org.apache.maven.surefire.booter.ForkedBooter

I fixed the issue as mentioned in this link:
https://stackoverflow.com/questions/53010200/maven-surefire-could-not-find-forkedbooter-class

updating the pom.xml in order to include

<project>
  [...]
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-surefire-plugin</artifactId>
        <version>2.22.1</version>
        <configuration>
          <useSystemClassLoader>false</useSystemClassLoader>
        </configuration>
      </plugin>
    </plugins>
  </build>
  [...]
</project>

file open mode error in professional-services/tools/hive-bigquery/test/generate_data.py

if open file in "wb", run in error with following message

Traceback (most recent call last):
File "test/generate_data.py", line 82, in
main()
File "test/generate_data.py", line 78, in main
writer.writerow(row)
TypeError: a bytes-like object is required, not 'str'

change open mode to "w"

direct-upload-to-gcs example not working

I've successfully deployed the direct-upload-to-gcs, however I am getting a '404 page not found' error from the service when I try to create a signed url. Are the steps provided in the example complete?

Data Generator dependencies broken in 2to3 migration

python data_generator_pipeline.py \
           --schema_file=../../bq_file_load_benchmark/json_schemas/benchmark_table_schemas/100_STRING_10.json \
           --num_records=10 \
           --output_bq_table=data-analytics-pocs:bqbml_test_staging_dataset.100_STRING_10  \
           --project=data-analytics-pocs \
           --setup_file=./setup.py \
           --staging_location=gs://bq_benchmark_dataflow_test/staging \
           --temp_location=gs://bq_benchmark_dataflow_test/temp  \
           --save_main_session \
           --worker_machine_type=n1-highcpu-32 \
           --runner=DataflowRunner

and I'm getting

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 770, in run
    self._load_main_session(self.local_staging_directory)
  File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 488, in _load_main_session
    pickler.load_session(session_file)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/internal/pickler.py", line 314, in load_session
    return dill.load_session(file_path)
  File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 368, in load_session
    module = unpickler.load()
  File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 472, in load
    obj = StockUnpickler.load(self)
  File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 462, in find_class
    return StockUnpickler.find_class(self, module, name)
  File "/usr/local/lib/python3.7/site-packages/data_generator/PrettyDataGenerator.py", line 30, in <module>
    from google.cloud import storage as gcs
  File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/__init__.py", line 39, in <module>
    from google.cloud.storage.blob import Blob
  File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 46, in <module>
    from google.resumable_media.requests import RawDownload
ImportError: cannot import name 'RawDownload' from 'google.resumable_media.requests' (/usr/local/lib/python3.7/site-packages/google/resumable_media/requests/__init__.py)

Seems to be issue with python3 and old version of beam / avro-python3

run create_hive_tables.sql path in readme.md

Run the command below to create Hive tables on the Hive cluster.

hive -f tools/create_hive_tables.sql

the correct path is:

hive -f test/create_hive_tables.sql

"gcloud ml-engine commands have been renamed" message on e2e-home-appliance-status-monitoring example

Hello, while following the e2e-home-appliance-status-monitoring example and got the warning below on step 1.

gcloud ml-engine models create EnergyDisaggregationModel --regions ${REGION} --project ${GOOGLE_PROJECT_ID}

WARNING: The gcloud ml-engine commands have been renamed and will soon be removed. Please use gcloud ai-platform instead.

I guess it would be better updating the readme with the new commands.

Thanks!!

Can you build multiple redis-cluster-gke with different namespaces?

MountVolume.SetUp failed for volume "redis-nodes" : configmap references non-existent config key: redis-nodes.txt

redis-config.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: redis-conf
  namespace: test
data:
  redis.conf: |
    port 6379
    cluster-enabled yes
    cluster-config-file nodes.conf
    cluster-node-timeout 5000
    appendonly yes

redis-expect.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: redis-expect
  namespace: test
data:
  redis-expect.script: |+
    #!/usr/bin/expect
    spawn /bin/bash
    expect "#" {
      send "(cat /tmp/redis-nodes/redis-nodes.txt) | (xargs -o /tmp/redis-stable/src/redis-cli --cluster create --cluster-replicas 1)\r"
    }
    expect "to accept):" {
      send "yes\r"
    }
    expect "#" {
      send "(cat /tmp/redis-nodes/redis-nodes.txt) | (while read node; do /tmp/redis-stable/src/redis-cli --cluster check \${node}; done)\r"
    }
    interact

redis-cache.yaml

apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: redis-pdb
  namespace: test
spec:
  minAvailable: 66%
  selector:
    matchLabels:
      app: redis
      redis-type: cache
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-cache
  namespace: test
spec:
  selector:
    matchLabels:
      app: redis
  replicas: 6
  template:
    metadata:
      labels:
        app: redis
        redis-type: cache
        namespace: test
    spec:
      hostNetwork: true
      nodeSelector:
        cloud.google.com/gke-nodepool:test-redis-pool
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app
                    operator: In
                    values:
                      - redis
              topologyKey: "kubernetes.io/hostname"
      containers:
        - name: redis-server
          image: "redis:4.0-alpine"
          imagePullPolicy: Always
          command:
            - "redis-server"
          args:
            - "/etc/redis/redis.conf"
            - "--protected-mode"
            - "no"
          resources:
            requests:
              cpu: "1"
              memory: "5Gi"
          ports:
            - name: redis
              containerPort: 6379
              protocol: "TCP"
            - name: redis-cluster
              containerPort: 16379
              protocol: "TCP"
          volumeMounts:
            - name: "redis-conf"
              mountPath: "/etc/redis"
      volumes:
        - name: "redis-conf"
          configMap:
            name: "redis-conf"
            items:
              - key: "redis.conf"
                path: "redis.conf"

redis-create-cluster.yaml

# Copyright 2018 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: batch/v1
kind: Job
metadata:
  name: redis-create-cluster
  namespace: test
spec:
  backoffLimit: 5
  activeDeadlineSeconds: 600
  template:
    spec:
      nodeSelector:
        cloud.google.com/gke-nodepool: test-redis-pool
      containers:
      - name: redis-cli
        image: ubuntu
        command: ["/bin/bash", "-c"]
        args: ["apt-get update && \
          DEBIAN_FRONTEND=noninteractive apt-get install -yq tzdata && \
          ln -fs /usr/share/zoneinfo/Asia/Tokyo /etc/localtime && \
          dpkg-reconfigure --frontend noninteractive tzdata && \
          DEBIAN_FRONTEND=noninteractive apt-get install -yq curl gcc make libjemalloc-dev expect && \
          cd /tmp && \
          curl -LO http://download.redis.io/redis-stable.tar.gz && \
          tar zxvf redis-stable.tar.gz && \
          cd redis-stable && \
          make distclean && \
          make && \
          expect -f /tmp/redis-expect/redis-expect.script"]
        volumeMounts:
          - name: "redis-nodes"
            mountPath: "/tmp/redis-nodes"
          - name: "redis-expect"
            mountPath: "/tmp/redis-expect"
      restartPolicy: Never
      volumes:
        - name: "redis-nodes"
          configMap:
            name: "redis-nodes"
            items:
              - key: "redis-nodes.txt"
                path: "redis-nodes.txt"
        - name: "redis-expect"
          configMap:
            name: "redis-expect"
            items:
              - key: "redis-expect.script"
                path: "redis-expect.script"

Asset inventory issue : load failed due to TIMESTAMP to STRING change ???

Hi there,
Cloud Dataflow job is failing due to this error:
sqladmin_googleapis_com_Instance. Field resource.data.scheduledMaintenance.startTime has changed type from TIMESTAMP to STRING [while running 'load_to_bigquery/load_to_bigquery']
this occurs on:

sqladmin_googleapis_com_Instance
compute_googleapis_com_RegionBackendService.
k8s_io_Pod
...
What can i do to correct ?
Thanks a lot for your help

Asset Inventory tool : pipeline failed on one of the last steps

Hi guys

After I got the view permissions on gs://professional-services-tools-asset-inventory/latest/import_pipeline template, I managed to run it via GAE app.
But it still failed. You can see the screenshot 👍

In the pipeline options I see parameters which I guess I have to be able to change them.

For example temp_location, staging_location, project (bmenasha-1)
How can I change them and can it be the issue why the pipeline got failed ?

Thx

Resolve tensorflow security issue

Update requirements.txt to depend on a more recent version.

tensorflow>=1.12.1

Vulnerable versions: >= 1.0.0, < 1.12.1
Patched version: 1.12.1
NULL pointer dereference in Google TensorFlow before 1.12.2 could cause a denial of service via an invalid GIF file.

Asset Inventory tool : error during pipeline running

Hi guys
I am using the Asset Inventory tool and I got an error during pipeline running:
ImportError: No module named asset_inventory.api_schema
I tried to use save_main_session = True option but it didn't help.

What am I doing wrong?

Thx

Examples folder missing

Hi there! Looks like the examples folder is missing for the slo-generator project.

Could this be added?

https://github.com/GoogleCloudPlatform/professional-services/tree/master/tools/slo-generator

Thanks!

export all user jobs in bigquery query plan exporter

bigquery query plan exporter export only the job of the user that runs the command. Which is a service account it is counterintuitive it would be good to be able to filter on the user.

Create Hangouts Chat Bot version

This is great. As a Enhancement can you create a version of this that using Appscript but instead of sending an email it updates a Hangouts Chat Room it has been added to.

Thanks!

Asset Inventory tool : Permission denied to template file

Hey guys,
I am trying to run Asset Inventory tool and my service account which I use can't get the template file getting error:
Template file failed to load: gs://professional-services-tools-asset-inventory/latest/import_pipeline. Permissions denied. @appspot.gserviceaccount.com does not have storage.objects.get access to professional-services-tools-asset-inventory/latest/import_pipeline

Should I create add upload the template by myself?

Fix data-analytics/iot-nirvana/client/pom.xml

There is a vulnerability in org.eclipse.paho:org.eclipse.paho.client.mqttv3.

The proposed remediation is:
Upgrade org.eclipse.paho:org.eclipse.paho.client.mqttv3 to version 1.2.1 or later.

also, please lint your java files using google-java-format.

You can use:

java -jar /usr/share/java/google-java-format-1.7-all-deps.jar -r my-file.java

on each java file to lint your code according to Google guidelines for java.

Add flake8 to CI tool to catch python errors sooner

This repo has a lot of issues w/ consistency and while some.
This will likely be a heavy lift at first but will pay dividends in the long run as the repo grows.

Let's add some more static checks:

flake8
shellcheck

Let's find a way to automate running the tests in cloud build for new assets.

BQ Visualizer Subdirectories are missing after refactor/move of components

Raising this as a sample Beginner issue

Can't see bigquery table created.

I tried to run the example. but meet a couple of problems.

First, I didn't see that bigquery table created after ran command "run_oncloud.sh", although the dataflow job is created and running. Second, I didn't see and device registered in IOT registry.

The IOT registry was created manually, because I found it failed when ran command setup_gcp_environment.sh because I use region us-east1. IOT only works in us-central1. I don't know if this matters.

./setup_gcp_environment.sh iot-poc-219115 us-east1 us-east1-b ppiot iotpub1 iotsub1 ppiot

Executing gcloud beta iot registries create ppiot --region us-east1 --event-notification-config=topic=projects/iot-poc-219115/topics/iotpub1
ERROR: (gcloud.iot.registries.create) NOT_FOUND: Cloud region not supported by this service. The name 'projects/iot-poc-219115/locations/us-east1/registries/ppiot' specifies the location 'us-east1', valid cloud regions are {asia-east1,europe-west1,us-central1}.

gcloud beta iot registries create ppiot --region us-central1 --event-notification-config=topic=projects/iot-poc-219115/topics/iotpub1

It will be appreciated if you can advise how I can fix these issues.

Thanks

Larry

Publish gsuite-exporter pypi so that pip install command works

The pip install command on this readme does not work —

$ pip install gsuite-exporter
Collecting gsuite-exporter
  ERROR: Could not find a version that satisfies the requirement gsuite-exporter (from versions: none)
ERROR: No matching distribution found for gsuite-exporter

I think this is because the author intended to publish this to pypi but never did?

https://pypi.org/search/?q=gsuite

gcloud composer command line is incorrect on instruction page

The correct command should be:

gcloud composer environments run ENV-NAME --location LOCATION variables -- --set KEY VALUE

The command on instruction page doesn't include the parameter "location"

Start Subscriber/View

Hi,
Everything is set and EnergyDisaggregationDemo_Client is up and running, but cannot get rid of the following error (also tried to declare subscriber as global):

Would you help me pls?

Creating subscription "sub1" to topic "pred" ...
Subscription "sub1" existed.

UnboundLocalError Traceback (most recent call last)
in
6 subscription_name=SUB_NAME,
7 app_id_name_map=app_id_name_map,
----> 8 target_device=DEVICE_ID)
9 tt.async_pull_msg()

in init(self, project_id, ground_truth, topic_name, subscription_name, app_id_name_map, target_device)
22 # create subscription
23 self._subscriber, self._subscription_path = (
---> 24 self.create_subscription(project_id, topic_name, subscription_name))
25 self._subscriber.subscribe(self._subscription_path,
26 callback=self._msg_callback)

in create_subscription(self, project_id, topic_name, subscription_name)
52 except Exception as e:
53 print('Subscription "{}" existed.'.format(subscription_name))
---> 54 return subscriber, subscription_path
55
56 def async_pull_msg(self):

UnboundLocalError: local variable 'subscriber' referenced before assignment

iot-nirvana: setup_gcp_environment.sh fails

The script fails trying to create a boot disk, just before deleting the compute instance. The error description:

ERROR: (gcloud.compute.images.create) Could not fetch resource:
 - The disk resource 'projects/$MY_PROJECT/zones/us-central1-a/disks/debian9-java8-img' is already being used by 'projects/$MY_PROJECT/zones/us-central1-a/instances/debian9-java8-img'

My real project id replaced with $MY_PROJECT in this error message.

Possible Solution:

Update the startup_install_java8.sh script to use sudo for installing default-jre
poweroff command, that's in the script, didn't work when I tried it from the VM itself, by sshing into it.

estimatedCostUsd in BQ Audit doesn't reflect BQ ML charges accurately

BQ ML CREATE MODEL queries are billed at $250 per model, per https://cloud.google.com/bigquery-ml/pricing

Since BQ ML SQL will appear in the SQL file, we should ensure that the estimated costs for the create model statements are reflected accurately

BigQuery audit query should use base 2, not base 10

examples/bigquery-audit-log/bigquery_audit_log.sql uses a base 10 gigabyte and terabyte, but should use base 2, since this is how BigQuery is billed.

Asset exporter tool - getting ImportError in GAE

Just tried to set up from scratch in new project. Followed steps from readme
When running the cron job I get this

ImportError: cannot import name 'expr_pb2' from 'google.type' (/env/lib/python3.7/site-packages/google/type/init.py)
at (/env/lib/python3.7/site-packages/google/iam/v1/policy_pb2.py:16)
at (/env/lib/python3.7/site-packages/google/iam/v1/iam_policy_pb2.py:17)
at (/env/lib/python3.7/site-packages/google/cloud/asset_v1/proto/assets_pb2.py:19)
at (/env/lib/python3.7/site-packages/google/cloud/asset_v1/proto/asset_service_pb2.py:20)
at (/env/lib/python3.7/site-packages/google/cloud/asset_v1/types.py:23)
at (/env/lib/python3.7/site-packages/google/cloud/asset_v1/init.py:20)
at (/srv/lib/asset_inventory/export.py:33)
at (/srv/main.py:45)
at import_app (/env/lib/python3.7/site-packages/gunicorn/util.py:350)
at load_wsgiapp (/env/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py:41)
at load (/env/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py:52)
at wsgi (/env/lib/python3.7/site-packages/gunicorn/app/base.py:67)
at load_wsgi (/env/lib/python3.7/site-packages/gunicorn/workers/base.py:138)
at init_process (/env/lib/python3.7/site-packages/gunicorn/workers/base.py:129)
at init_process (/env/lib/python3.7/site-packages/gunicorn/workers/gthread.py:104)
at spawn_worker (/env/lib/python3.7/site-packages/gunicorn/arbiter.py:583)

ValueError: Unable to get the Filesystem for path gs://XXXXX/XXXXX

Hi Guys,

Trying to follow the data_flow example "data_ingestion.py", but every time I run it I get the following error:

Traceback (most recent call last): File "pipeline01.py", line 115, in <module> run() File "pipeline01.py", line 97, in run | 'Write to BigQuery' >> beam.io.Write( File "/usr/local/lib/python2.7/site-packages/apache_beam/io/textio.py", line 522, in __init__ skip_header_lines=skip_header_lines) File "/usr/local/lib/python2.7/site-packages/apache_beam/io/textio.py", line 117, in __init__ validate=validate) File "/usr/local/lib/python2.7/site-packages/apache_beam/io/filebasedsource.py", line 118, in __init__ self._validate() File "/usr/local/lib/python2.7/site-packages/apache_beam/options/value_provider.py", line 124, in _f return fnc(self, *args, **kwargs) File "/usr/local/lib/python2.7/site-packages/apache_beam/io/filebasedsource.py", line 175, in _validate match_result = FileSystems.match([pattern], limits=[1])[0] File "/usr/local/lib/python2.7/site-packages/apache_beam/io/filesystems.py", line 153, in match filesystem = FileSystems.get_filesystem(patterns[0]) File "/usr/local/lib/python2.7/site-packages/apache_beam/io/filesystems.py", line 84, in get_filesystem raise ValueError('Unable to get the Filesystem for path %s' % path) ValueError: Unable to get the Filesystem for path gs://python-dataflow-example/data_files/head_usa_names.csv

The package versions I have on my machine:

apache-beam==2.4.0
google-cloud==0.32.0
google-cloud-storage==1.6.0
google-cloud-bigquery==1.1.0
google-cloud-dataflow==2.4.0

Couldn't spot what I'm missing or doing wrong here.

Questions and Feedback on the GSuite Exporter

Questions

Are more G Suite Admin APIs supported now? The documentation indicates activity reports for admin, drive, login, mobile, and token are supported, but I see in the error message that the script expects other activities:

<HttpError 400 when requesting https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/login%20token?alt=json returned "Invalid value 'login token'. Values must match the following regular expression: '(admin)|(calendar)|(drive)|(login)|(mobile)|(token)|(groups)|(saml)|(chat)|(gplus)|(rules)|(jamboard)|(meet)|(user_accounts)|(access_transparency)'

Are there example screenshots of what the exported logs would look like?
If the script runner is expected to authenticate with his or her G Suite / Cloud Identity Super Admin user account, how would we run this export command periodically as a cron job via a GCE instance? Can the script runner be a service account? If so, how would the authentication process look?
When I tried to use a service account as the argument for --admin-user, I got the following error:

unauthorized_client: Client is unauthorized to retrieve access tokens using this method, or client not authorized for any of the scopes requested.'

How would the proper authorization be granted to a service account we want to use?

Feedback

One of the listed requirements is NOT accurate. roles/iam.tokenCreator should really be roles/iam.serviceAccountTokenCreator.
This example as is would yield an error for the user, as the --applications option cannot handle space delimited words. To export additional activities, I had to run the gsuite-exporter command on individual APIs separately.

gsuite-exporter
  --credentials-path='/path/to/service/account/credentials.json'
  --admin-user='<your_gsuite_admin>@<your_domain>'
  --api='report_v1'
  --applications='login drive token'
  --project-id='<logging_project>'
  --exporter='stackdriver_exporter.StackdriverExporter'

The process of configuring G Suite admin account should be better documented. I had to scour the web to find documentation on a third-party site. Give how different G Suite is from GCP, please invest the time to explain in plain English why each step in the setup is required.
Is there a reason this Python package is only limited to Python 2.7? Please consider getting it working for Python 3 as well.
Please explain the purpose of each parameter. The repo currently does not provide any documentation on the required parameters (--credentials-path, --admin-user, --api, --applications, --project-id, and --exporter). The lack of documentation on the parameters is especially frustrating, as a naive user would not know immediately why both --credentials-path and --admin-user are required and that the --admin-user is expected to be the G Suite / Cloud Identity Super Admin human user. Given that expectation, the README should also explain how the authentication of the Super Admin should be done. To get the script to run, I figured out that I needed to run gcloud auth login first to authenticate with my Super Admin identity. That detail should be provided in the README clearly.

Error using Cloud Asset Inventory Import To BigQuery

At Step 11 in quick start,

gcloud dataflow jobs run $JOB_NAME --gcs-location gs://professional-services-tools-asset-inventory/latest/import_pipeline --parameters="input=$BUCKET/*.json,stage=$BUCKET/stage,load_time=$LOAD_TIME,group_by=ASSET_TYPE,dataset=asset_inventory,write_disposition=WRITE_APPEND" --staging-location $BUCKET/staging

After Running this command, the dataflow console gives me these errors:

Autoscaling is enabled for job 2019-03-19_15_09_01-12592583255883649300. The number of workers will be between 1 and 1000.

Autoscaling was automatically enabled for job 2019-03-19_15_09_01-12592583255883649300.

Checking permissions granted to controller Service Account.

Staged package apache_beam-2.9.0-cp27-cp27mu-manylinux1_x86_64.whl at location 'gs://professional-services-tools-asset-inventory/export_resources_staging_location/beamapp-bmenasha-0316165513-832710.1552755313.832831/apache_beam-2.9.0-cp27-cp27mu-manylinux1_x86_64.whl' is inaccessible.

Staged package dataflow_python_sdk.tar at location 'gs://professional-services-tools-asset-inventory/export_resources_staging_location/beamapp-bmenasha-0316165513-832710.1552755313.832831/dataflow_python_sdk.tar' is inaccessible.

Staged package pickled_main_session at location 'gs://professional-services-tools-asset-inventory/export_resources_staging_location/beamapp-bmenasha-0316165513-832710.1552755313.832831/pickled_main_session' is inaccessible.

Staged package workflow.tar.gz at location 'gs://professional-services-tools-asset-inventory/export_resources_staging_location/beamapp-bmenasha-0316165513-832710.1552755313.832831/workflow.tar.gz' is inaccessible.

Workflow failed. Causes: One or more access checks for temp location or staged files failed. Please refer to other error messages for details. For more information on security and permissions, please see https://cloud.google.com/dataflow/security-and-permissions.

Cleaning up.

Worker pool stopped.

googlecloudplatform / professional-services Goto Github PK

professional-services's Introduction

Professional Services

Disclaimer

License

Examples

Tools

Contributing

Contact

professional-services's People

Contributors

Stargazers

Watchers

Forkers

professional-services's Issues

The following error was received after using "mvn clean install" `------------------------------------------------------- T E S T S

Recommend Projects

Recommend Topics

Recommend Org

Jobs

The following error was received after using "mvn clean install"
`-------------------------------------------------------
T E S T S