GithubHelp home page GithubHelp logo

googlecloudplatform / professional-services Goto Github PK

View Code? Open in Web Editor NEW
2.7K 150.0 1.3K 375.55 MB

Common solutions and tools developed by Google Cloud's Professional Services team. This repository and its contents are not an officially supported Google product.

License: Apache License 2.0

Python 47.74% Shell 2.08% CSS 0.32% HTML 22.80% JavaScript 1.76% Dockerfile 0.38% Go 7.20% Java 3.33% TypeScript 2.95% Makefile 0.26% HCL 4.31% Scala 1.89% Jinja 1.10% Smarty 0.36% C# 2.17% PLSQL 1.19% PowerShell 0.17%
google-cloud-platform google-cloud-dataflow google-cloud-ml google-cloud-compute gke bigquery solutions tools examples

professional-services's Introduction

Professional Services

Common solutions and tools developed by Google Cloud's Professional Services team.

Disclaimer

This repository and its contents are not an officially supported Google product.

License

All solutions within this repository are provided under the Apache 2.0 license. Please see the LICENSE file for more detailed terms and conditions.

Open in Cloud Shell

Examples

The examples folder contains example solutions across a variety of Google Cloud Platform products. Use these solutions as a reference for your own or extend them to fit your particular use case.

Tools

The tools folder contains ready-made utilities which can simplify Google Cloud Platform usage.

  • Agile Machine Learning API - A web application which provides the ability to train and deploy ML models on Google Cloud Machine Learning Engine, and visualize the predicted results using LIME through simple post request.
  • Airflow DAG Metadata Generator - Use Google's generative models to analyze Airflow DAGs and supplement them with generated description, tags, and doc_md values.
  • Airflow States Collector - A tool that creates and uploads an airflow dag to the dags GCS folder. The dag incrementally collect airflow task states and stores to BQ. It also autogenerates a LookerStudio dashboard querying the BQ view.
  • Airpiler - A python script to convert Autosys JIL files to dag-factory format to be executed in Cloud Composer (managed airflow environment).
  • Ansible Module for Anthos on Bare Metal - Ansible module for installation of Anthos on Bare Metal
  • Anthos Bare Metal Installer - An ansible playbook that can be used to install Anthos Bare Metal.
  • Apache Beam Client Throttling - A library that can be used to limit the number of requests from an Apache Beam pipeline to an external service. It buffers requests to not overload the external service and activates client-side throttling when the service starts rejecting requests due to out of quota errors.
  • API Key Rotation Checker - A tool that checks your GCP organization for API keys and compares them to a customizable rotation period. Regularly rotating API keys is a Google and industry standard recommended best practice.
  • AssetInventory - Import Cloud Asset Inventory resourcs into BigQuery.
  • BigQuery Discount Per-Project Attribution - A tool that automates the generation of a BigQuery table that uses existing exported billing data, by attributing both CUD and SUD charges on a per-project basis.
  • BigQuery Policy Tag Utility - Utility class for tagging BQ Table Schemas with Data Catalog Taxonomy Policy Tags. Create BQ Authorized Views using Policy Tags. Helper utility to provision BigQuery Dataset, Data Catalog Taxonomy and Policy Tags.
  • BigQuery Query Plan Exporter - Command line utility for exporting BigQuery query plans in a given date range.
  • BigQuery Query Plan Visualizer - A web application which provides the ability to visualise the execution stages of BigQuery query plans to aid in the optimization of queries.
  • BigQuery z/OS Mainframe Connector - A utility used to load COBOL MVS data sets into BigQuery and execute query and load jobs from the IBM z/OS Mainframe.
  • Boolean Organization Policy Enforcer - A tool to find the projects that do not set a boolean organization policy to its expected state, subsequently, set the organization policy to its expected set.
  • Capacity Planner CLI - A stand-alone tool to extract peak resource usage values and corresponding timestamps for a given GCP project, time range and timezone.
  • Capacity Planner Sheets Extension - A Google Sheets extension to extract peak resource usage values and corresponding timestamps for a given GCP project, time range and timezone.
  • CloudConnect - A package that automates the setup of dual VPN tunnels between AWS and GCP.
  • Cloudera Parcel GCS Connector - This script helps you create a Cloudera parcel that includes Google Cloud Storage connector. The parcel can be deployed on a Cloudera managed cluster. This script helps you create a Cloudera parcel that includes Google Cloud Storage connector. The parcel can be deployed on a Cloudera managed cluster.
  • Cloud AI Vision Utilities - This is an installable Python package that provides support tools for Cloud AI Vision. Currently there are a few scripts for generating an AutoML Vision dataset CSV file from either raw images or image annotation files in PASCAL VOC format.
  • Cloud Composer Backup and Recovery - A command line tool for applying backup and recovery operations on Cloud Composer Airflow environments.
  • Cloud Composer DAG Validation - An automated process for running validation and testing against DAGs in Composer.
  • Cloud Composer Migration Complexity Assessment - An Airflow DAG that uses a variety of tools to analyze a Cloud Composer 1 environment, determine a work estimate, and accelerate the conversion of airflow 1 dags to airflow 2 dags.
  • Cloud Composer Migration Terraform Generator - Analyzes an existing Cloud Composer 1 / Airflow 1 environment and generates terraform. Configures new Cloud Composer 2 environment to meet your workload demands.
  • CUD Prioritized Attribution - A tool that allows GCP customers who purchased Committed Use Discounts (CUDs) to prioritize a specific scope (e.g. project or folder) to attribute CUDs first before letting any unconsumed discount float to other parts of an organization.
  • Custom Role Analyzer - This tool will provide useful insights with respect to custom roles at organization level as well as project level to find predefined roles from which the custom role is built.
  • Custom Role Manager - Manages organization- or project-level custom roles by combining predefined roles and including and removing permissions with wildcards. Can run as Cloud Function or output Terraform resources.
  • Dataproc Event Driven Spark Recommendations - Use Google Cloud Functions to analyze Cloud Dataproc clusters and recommend best practices for Apache Spark jobs. Also logs cluster configurations for future reference.
  • Dataproc Scheduled Cluster Sizing - Use Google Cloud Scheduler an Google Cloud Functions to schedule the resizing of a Dataproc cluster. Changes the primary and secondary worker count.
  • DataStream Deployment Automation - Python script to automate the deployment of Google Cloud DataStream. This script will create connection profiles, create stream and start stream.
  • DLP to Data Catalog - Inspect your tables using Data Loss Prevention for PII data and automatically tag it on Data Catalog using Python.
  • DNS Sync - Sync a Cloud DNS zone with GCE resources. Instances and load balancers are added to the cloud DNS zone as they start from compute_engine_activity log events sent from a pub/sub push subscription. Can sync multiple projects to a single Cloud DNS zone.
  • Firewall Enforcer - Automatically watch & remove illegal firewall rules across organization. Firewall rules are monitored by a Cloud Asset Inventory Feed, which trigger a Cloud Function that inspects the firewall rule and deletes it if it fails a test.
  • GCE Disk Encryption Converter - A tool that converts disks attached to a GCE VM instance from Google-managed keys to a customer-managed key stored in Cloud KMS.
  • GCE switch disk-type - A tool that changes type of disks attached to a GCE instance.
  • GCE Quota Sync - A tool that fetches resource quota usage from the GCE API and synchronizes it to Stackdriver as a custom metric, where it can be used to define automated alerts.
  • GCE Usage Log - Collect GCE instance events into a BigQuery dataset, surfacing your vCPUs, RAM, and Persistent Disk, sliced by project, zone, and labels.
  • GCP Architecture Visualizer - A tool that takes CSV output from a Forseti Inventory scan and draws out a dynamic hierarchical tree diagram of org -> folders -> projects -> gcp_resources using the D3.js javascript library.
  • GCP AWS HA VPN Connection terraform - Terraform script to setup HA VPN between GCP and AWS.
  • GCP Azure HA VPN Connection Terraform - Terraform code to setup HA VPN between GCP and Microsoft Azure.
  • GCP Organization Hierarchy Viewer - A CLI utility for visualizing your organization hierarchy in the terminal.
  • GCPViz - a visualization tool that takes input from Cloud Asset Inventory, creates relationships between assets and outputs a format compatible with graphviz.
  • GCS Bucket Mover - A tool to move user's bucket, including objects, metadata, and ACL, from one project to another.
  • GCS to BigQuery - A tool fetches object metadata from all Google Cloud Storage buckets and exports it in a format that can be imported into BigQuery for further analysis.
  • GCS Usage Recommender - A tool that generates bucket-level intelligence and access patterns across all projects for a GCP project to generate recommended object lifecycle management.
  • GCVE2BQ - A tool for scheduled exports of VM, datastore and ESXi utilization data from vCenter to BigQuery for billing and reporting use cases.
  • GKE AutoPSC Controller - Google Kubernetes Engine controller, to setup PSC ServiceAttachment for Gateway API managed Forwarding Rules.
  • Global DNS -> Zonal DNS Project Bulk Migration - A shell script for gDNS-zDNS project bulk migration.
  • GKE Billing Export - Google Kubernetes Engine fine grained billing export.
  • gmon - A command-line interface (CLI) for Cloud Monitoring written in Python.
  • Google Cloud Support Slackbot - Slack application that pulls Google Cloud support case information via the Cloud Support API and pushes the information to Slack
  • GSuite Exporter Cloud Function - A script that deploys a Cloud Function and Cloud Scheduler job that executes the GSuite Exporter tool automatically on a cadence.
  • GSuite Exporter - A Python package that automates syncing Admin SDK APIs activity reports to a GCP destination. The module takes entries from the chosen Admin SDK API, converts them into the appropriate format for the destination, and exports them to a destination (e.g: Stackdriver Logging).
  • Hive to BigQuery - A Python framework to migrate Hive table to BigQuery using Cloud SQL to keep track of the migration progress.
  • IAM Permissions Copier - This tool allows you to copy supported GCP IAM permissions from unmanaged users to managed Cloud Identity users.
  • IAM Recommender at Scale - A python package that automates applying iam recommendations.
  • Instance Mapper - Maps different IaaS VM instance types from EC2 and Azure Compute to Google Cloud Platform instance types using a customizable score-based method. Also supports database instances.
  • IPAM Autopilot - A simple tool for managing IP address ranges for GCP subnets.
  • K8S-2-GSM - A containerized golang app to migrate Kubernetes secrets to Google Secrets Manger (to leverage CSI secret driver). LabelMaker - A tool that reads key:value pairs from a json file and labels the running instance and all attached drives accordingly.
  • Logbucket Global to Regional - Utility to change _Default sink destination to regional log buckets
  • Machine Learning Auto Exploratory Data Analysis and Feature Recommendation - A tool to perform comprehensive auto EDA, based on which feature recommendations are made, and a summary report will be generated.
  • Maven Archetype Dataflow - A maven archetype which bootstraps a Dataflow project with common plugins pre-configured to help maintain high code quality.
  • Netblock Monitor - An Apps Script project that will automatically provide email notifications when changes are made to Google’s IP ranges.
  • OpenAPI to Cloud Armor converter - A simple tool to generate Cloud Armor policies from OpenAPI specifications.
  • Permission Discrepancy Finder - A tool to find the principals with missing permissions on a resource within a project, subsequently, grants them the missing permissions.
  • Pubsub2Inbox - A generic Cloud Function-based tool that takes input from Pub/Sub messages and turns them into email, webhooks or GCS objects.
  • Quota Manager - A python module to programmatically update GCP service quotas such as bigquery.googleapis.com.
  • Quota Monitoring and Alerting - An easy-to-deploy Data Studio Dashboard with alerting capabilities, showing usage and quota limits in an organization or folder.
  • Ranger Hive Assessment for BigQuery/BigLake IAM migration - A tool that assesses which Ranger authorization rules can be migrated or not to BigQuery/BigLake IAM.
  • Reddit Comment Streaming - Use PRAW, TextBlob, and Google Python API to collect and analyze reddit comments. Pushes comments to a Google Pub/sub Topic.
  • Secret Manager Helper - A Java library to make it easy to replace placeholder strings with Secret Manager secret payloads.
  • Service Account Provider - A tool to exchange GitLab CI JWT tokens against GCP IAM access tokens, in order to allow GitLab CI jobs to access Google Cloud APIs
  • Site Verification Group Sync - A tool to provision "verified owner" permissions (to create GCS buckets with custom dns) based on membership of a Google Group.
  • SLO Generator - A Python package that automates computation of Service Level Objectives, Error Budgets and Burn Rates on GCP, and export the computation results to available exporters (e.g: PubSub, BigQuery, Stackdriver Monitoring), using policies written in JSON format.
  • Snowflake_to_BQ - A shell script to transfer tables (schema & data) from Snowflake to BigQuery.
  • SPIFFE GCP Proxy - A tool to ease the integration of SPIFFE supported On-Prem workloads with GCP APIs using Workload Identity Federation
  • STS Job Manager - A petabyte-scale bucket migration tool utilizing Storage Transfer Service
  • VM Migrator - This utility automates migrating Virtual Machine instances within GCP. You can migrate VM's from one zone to another zone/region within the same project or different projects while retaining all the original VM properties like disks, network interfaces, ip, metadata, network tags and much more.
  • VPC Flow Logs Analysis - A configurable Log sink + BigQuery report that shows traffic attributed to the projects in the Shared VPCs.
  • VPC Flow Logs Enforcer - A Cloud Function that will automatically enable VPC Flow Logs when a subnet is created or modified in any project under a particular folder or folders.
  • VPC Flow Logs Top Talkers - A configurable Log sink + BigQuery view to generate monthly/daily aggregate traffic reports per subnet or host, with the configurable labelling of IP ranges and ports.
  • Webhook Ingestion Data Pipeline - A deployable app to accept and ingest unauthenticated webhook data to BigQuery.
  • XSD to BigQuery Schema Generator - A command line tool for converting an XSD schema representing deeply nested and repeated XML content into a BigQuery compatible table schema represented in JSON.
  • Numeric Family Recommender - Oracle - The Numeric Family Recommender is a database script that recommends the best numeric data type for the NUMBER data type when migrating from legacy databases like Oracle to Google Cloud platforms like BigQuery, AlloyDB, Cloud SQL for PostgreSQL, and Google Cloud Storage.
  • Composer DAG Load Generator - This is an automatic DAG generator tool which can be used to create test workload on a cloud composer environmnet and to test differents airflows configurations or to do fine tune using the composer/airflow metrics.
  • Gradio and Generative AI Example - The example code allows developers to create rapid Generative AI PoC applications with Gradio and Gen AI agents.
  • Memorystore Cluster Ops Framework - This is a framework that provides the tools to apply cluster level operations that enable capabilities like cluster backups, migration & validation, etc. The framework can be extended for other use cases as required. The framework uses RIOT to bridge current product gaps with Memorystore Clusters
  • ML Project Generator - A utility to create a Production grade ML project template with the best productivity tools installed like auto-formatting, license checks, linting, etc.

Contributing

See the contributing instructions to get started contributing.

Contact

Questions, issues, and comments should be directed to [email protected].

professional-services's People

Contributors

adrienwalkowiak avatar alexamies avatar arthurarg avatar bmenasha avatar boredabdel avatar cyarros10 avatar danieldeleo avatar dependabot[bot] avatar freedomofnet avatar galic1987 avatar henryken avatar iht avatar kardiff18 avatar ludoo avatar matt-gen avatar michaelwsherman avatar mihir25 avatar misabhishek avatar morgante avatar mwallman avatar omerhabas avatar pkattamuri avatar prathapreddy123 avatar rilkeanheart avatar rosmo avatar ryanmcdowell avatar sahanasub avatar satishathukuri avatar smeyn avatar tswast avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

professional-services's Issues

Export to csv / json

Hi.
Is it possible to export Google Drive Activiy (i.e. file downloads) to a csv / json file filtering by a date range?

gke redis-cluster deployment error.

Hi.

Thanks for the article about gke cluster-redis, is very interesting.

I realize the configration and when y tray to test it, i have the next error.

kubectl get pods -l app=redis,redis-type=cache -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READI
NESS GATES
redis-cache-64f4d98457-7lph9 1/1 Running 0 19m 10.0.1.34 gke-dev-ironia-pruebas-d1d66cb9-d9q7 <none

redis-cache-64f4d98457-bkwx4 1/1 Running 0 6m58s 10.0.1.27 gke-dev-ironia-default-pool-1f34a49a-nc7b <none

avalencia@cloudshell:~/redis/professional-services/examples/redis-cluster-gke$ kubectl run -it redis-cli --image=redis --restart=Never /bin/bash
If you don't see a command prompt, try pressing enter.
root@redis-cli:/data# redis-cli -c -p 6379 -h 10.0.1.34
10.0.1.34:6379> set foo bar
(error) CLUSTERDOWN Hash slot not served
10.0.1.34:6379>

Any idea about the problem?

Thanks in advance

Missing Dataflow python code - data_generation_for_benchmarking.py

Hi - it looks like the file data_generation_for_benchmarking.py has been removed from the current master? This was previously available in "professional-services/examples/dataflow-python-examples/dataflow_python_examples"

I use this program in a datalflow example so if you could indicate if there is an alternative I can use - or whether I should even continue to use this example given the code has been deleted?

thanks

Fix vulnerabilities in bq-visualizer

gts init -y returns the following for bq-visualizer:

audited 43161 packages in 12.274s
found 616 vulnerabilities (1 low, 1 moderate, 612 high, 2 critical)
run npm audit fix to fix them, or npm audit for details

Please update your packages to fix the vulnerabilities.

Cloudera GCS connector parcel - DistCp guide

Hi! This is just a suggestion, not really an issue -- please feel free to close this if this is not the right place to raise this.

TL;DR: it would be nice to show in the guide how to make the connector work for DistCp.

When we follow the guide, there is a issue that always pops up every time we do a DistCp transfer:

Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found

Which results from the HADOOP_CLASSPATH not being updated for all the worker nodes.

The README shows the following snippets for testing:

$ hdfs dfs -ls gs://bucket_name

However, we found out that this is not a good enough test for DistCp, since the -ls command does not need the HADOOP_CLASSPATH to be set in other worker nodes.

It would be very useful to have the Cloudera environment settings for DistCp in the guide, so it would be much more painless to onboard data engineers into GCP.

Thanks!

Asset Inventory tool : Installation issues (json & missing dependencies)

Hi guys

I ran into some issues during your installation instructions for Asset Inventory tool :

  • in the config.yaml file there is the key "import_pipeline_runtime_environment" with not valid json value. After I fixed it the issue has gone.
  • there is missing dependency in your requirements.txt file - "requests-futures"

Thx
Evgeny

gce-to-adminsdk generates a lot of errors despite working

Is it normal for gce-to-adminsdk to log a bunch of errors while running in CloudFunctions? It works (makes authenticated API calls to admin sdk APIs), but every time it runs produces a good 20 entries of errors to Stackdriver logging.

E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 ImportError: file_cache is unavailable when using oauth2client >= 4.0.0 or google-auth activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044     'file_cache is unavailable when using oauth2client >= 4.0.0 or google-auth') activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044   File "/env/local/lib/python3.7/site-packages/googleapiclient/discovery_cache/file_cache.py", line 41, in <module> activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044     from . import file_cache activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044   File "/env/local/lib/python3.7/site-packages/googleapiclient/discovery_cache/__init__.py", line 41, in autodetect activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 Traceback (most recent call last): activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 {"insertId":"000021-84c0778b-e6aa-4ddb-9877-3124648fb453","resource":{"type":"cloud_function","labels":{"function_name":"activities_list_to_json","project_id":"pg-gx-n-app-934447","region":"us-central1"}},"timestamp":"2019-08-23T13:50:20.535Z","severity":"ERROR","labels":{"execution_id":"71596292954… activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 During handling of the above exception, another exception occurred: activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 {"insertId":"000019-84c0778b-e6aa-4ddb-9877-3124648fb453","resource":{"type":"cloud_function","labels":{"project_id":"pg-gx-n-app-934447","region":"us-central1","function_name":"activities_list_to_json"}},"timestamp":"2019-08-23T13:50:20.535Z","severity":"ERROR","labels":{"execution_id":"71596292954… activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 ModuleNotFoundError: No module named 'oauth2client' activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044     from oauth2client.locked_file import LockedFile activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044   File "/env/local/lib/python3.7/site-packages/googleapiclient/discovery_cache/file_cache.py", line 37, in <module> activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 Traceback (most recent call last): activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 {"insertId":"000014-84c0778b-e6aa-4ddb-9877-3124648fb453","resource":{"type":"cloud_function","labels":{"project_id":"pg-gx-n-app-934447","region":"us-central1","function_name":"activities_list_to_json"}},"timestamp":"2019-08-23T13:50:20.535Z","severity":"ERROR","labels":{"execution_id":"71596292954… activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 During handling of the above exception, another exception occurred: activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 {"insertId":"000012-84c0778b-e6aa-4ddb-9877-3124648fb453","resource":{"type":"cloud_function","labels":{"project_id":"pg-gx-n-app-934447","region":"us-central1","function_name":"activities_list_to_json"}},"timestamp":"2019-08-23T13:50:20.535Z","severity":"ERROR","labels":{"execution_id":"71596292954… activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 ModuleNotFoundError: No module named 'oauth2client' activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044     from oauth2client.contrib.locked_file import LockedFile activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044   File "/env/local/lib/python3.7/site-packages/googleapiclient/discovery_cache/file_cache.py", line 33, in <module> activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 Traceback (most recent call last): activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 {"insertId":"000007-84c0778b-e6aa-4ddb-9877-3124648fb453","resource":{"type":"cloud_function","labels":{"project_id":"pg-gx-n-app-934447","region":"us-central1","function_name":"activities_list_to_json"}},"timestamp":"2019-08-23T13:50:20.535Z","severity":"ERROR","labels":{"execution_id":"71596292954… activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 During handling of the above exception, another exception occurred: activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 {"insertId":"000005-84c0778b-e6aa-4ddb-9877-3124648fb453","resource":{"type":"cloud_function","labels":{"project_id":"pg-gx-n-app-934447","region":"us-central1","function_name":"activities_list_to_json"}},"timestamp":"2019-08-23T13:50:20.535Z","severity":"ERROR","labels":{"execution_id":"71596292954… activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 ModuleNotFoundError: No module named 'google.appengine' activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044     from google.appengine.api import memcache activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044   File "/env/local/lib/python3.7/site-packages/googleapiclient/discovery_cache/__init__.py", line 36, in autodetect activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 Traceback (most recent call last): activities_list_to_json 715962929541044 
E 2019-08-23T13:50:20.535Z activities_list_to_json 715962929541044 file_cache is unavailable when using oauth2client >= 4.0.0 or google-auth activities_list_to_json 715962929541044 

Thoughts?

Cloudera GCS connector parcel script does not copy directory files

When running the script, files in the same directory (e.g. the JSON key) are not copied over to the parcel folder.

In the following script:
professional-services/tools/cloudera-parcel-gcsconnector/create_parcel.sh

  # Download gcs connector jar and copy all folder content to parcel location
  # Set flag for further parcel file existence validation
  curl -o gcs-connector-hadoop2-latest.jar --fail ${GCSJAR_LINK} || GCSJAR_FLAG="0"

  # Validate if package downloaded properly
  if [[ ${GCSJAR_FLAG} = "0" ]]; then
    echo "Error: hadoop connector failed to download, check network connectivity or file/folder permissions"
    graceful_exit
  fi

  cp gcs-connector-hadoop2-latest.jar ${PARCEL_FULLNAME}/lib/hadoop/lib/

This part of the script only copies the jar file, but not other files in the folder.

I noticed that the correct code was removed in this commit, specifically:

##Download gcs connector jar and copy all folder content to parcel location
wget https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-hadoop2-latest.jar
if [[ $? -ne 0 ]]; then
   echo "Download GCS connector failed!"
   exit 1
fi

cp * ${filen^^}-$version/lib/hadoop/lib/

## Create parcel.json file required for parcel packaging
cat >>${filen^^}-$version/meta/parcel.json<< EOL

Asset inventory issue

Running into duplicate key entries at the Big Query level when utilizing the asset inventory app engine via dataflow template.

`[root@jump01 asset_inventory]# bq show -j
Job project1:

Job Type State Start Time Duration User Email Bytes Processed Bytes Billed Billing Tier Labels


load FAILURE 16 Sep 18:57:16 0:00:00.633000 [email protected]

Error encountered during job execution:
Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.
Failure details:

  • Error while reading data, error message: JSON processing
    encountered too many errors, giving up. Rows: 1; errors: 1; max
    bad: 0; error percent: 0
  • gs://-assets/stage/2019-09-16T22:48:25.550864/compute.g
    oogleapis.com/GlobalForwardingRule.0.json: Error while reading
    data, error message: JSON parsing error in row starting at position
    0: Multiple definition of field: resource.data.ipProtocol
    `

gke-billing-export can't scrape pod data after one hour anymore

We deployed gke-billing-export on one of our GKE clusters to gather data from six clusters running in three different projects.
It is running with its own service account like stated in the readme, permissions were granted like described.
The BigQuery dataset has not been created in any of those projects, it's placed in its own project.

Our setup looks like the following:
project1: cluster1
project2: cluster2
project3: cluster3-6
The used Kubernetes versions differ slightly: Cluster1 and 2 are running on v1.11.5-gke.5, clusters 3-6 are running partially on v1.11.2-gke.25 and v1.11.4-gke.12

During startup, the app discovers all clusters in the three projects, scrapes data from the Kubernetes Master API and writes is successfully into BigQuery.
After one hour, it cannot query the K8s Master API anymore and it does never recover from that. In the log, we're seeing the following (I replaced the names of projects and clusters with generic ones) :

2019/01/14 16:52:05 Fetching a list of all clusters
2019/01/14 16:52:06   project1/cluster1
2019/01/14 16:52:06   project2/cluster2
2019/01/14 16:52:07   project3/cluster3
2019/01/14 16:52:07   project3/cluster4
2019/01/14 16:52:07   project3/cluster5
2019/01/14 16:52:07   project3/cluster6
2019/01/14 16:52:07 Sent 122 rows to bigquery for project "project1" cluster "cluster1" in 654.644221ms
2019/01/14 16:52:07 Sent 211 rows to bigquery for project "project2" cluster "cluster2" in 659.385595ms
2019/01/14 16:52:07 Sent 20 rows to bigquery for project "project3" cluster "cluster4" in 671.863548ms
2019/01/14 16:52:08 Sent 348 rows to bigquery for project "project3" cluster "cluster3" in 1.195654306s
2019/01/14 16:52:08 Sent 291 rows to bigquery for project "project3" cluster "cluster5" in 1.709932261s
2019/01/14 16:52:10 Sent 259 rows to bigquery for project "project3" cluster "cluster6" in 3.468579094s
[...]
2019/01/14 17:52:07 Error fetching pods for project3/cluster3: Unauthorized
2019/01/14 17:52:07 Error fetching pods for project3/cluster4: Unauthorized
2019/01/14 17:52:07 Error fetching pods for project2/cluster2: Unauthorized
2019/01/14 17:52:07 Error fetching pods for project1/cluster1: Unauthorized
2019/01/14 17:52:07 Error fetching pods for project3/cluster5: Unauthorized
2019/01/14 17:52:07 Error fetching pods for project3/cluster6: Unauthorized

We tested different intervals (from 60s to 600s) for scraping and can reproduce the behaviour every time. It works for exactly one hour and then we only see the "Unauthorized" log messages.

Any help what/where to check would be appreciated.

iot-nirvana: Build the solution - Error

The following error was received after using "mvn clean install"
`-------------------------------------------------------
T E S T S

Error: Could not find or load main class org.apache.maven.surefire.booter.ForkedBooter

Results :

Tests run: 0, Failures: 0, Errors: 0, Skipped: 0

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for IoT Nirvana 0.1-SNAPSHOT:
[INFO]
[INFO] IoT Nirvana ........................................ SUCCESS [ 0.810 s]
[INFO] IoT Nirvana Common ................................. FAILURE [ 6.042 s]
[INFO] IoT Nirvana Frontend ............................... SKIPPED
[INFO] IoT Nirvana Client ................................. SKIPPED
[INFO] IoT Nirvana Pipeline ............................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 7.324 s
[INFO] Finished at: 2019-01-26T13:40:21Z
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12.4:test (default-test) on project google-cloud-demo-iot-nirvana-common: Execution default-test of goal org.apache.maven.plugins:maven-surefire-plugin:2.12.4:test failed: The forked VM terminated without saying properly goodbye. VM crash or System.exit called ? -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :google-cloud-demo-iot-nirvana-common`

bq-visualizer: fix building process / improve how to build the code docs.

While building the code for bq-visualizer:

npm install
ng build

and I get the following error

ERROR in src/app/app-routing.module.ts(18,29): error TS2307: Cannot find module './main/main.component'.
src/app/app-routing.module.ts(19,30): error TS2307: Cannot find module './terms/terms.component'.
src/app/app.module.ts(44,29): error TS2307: Cannot find module './main/main.component'.
src/app/app.module.ts(50,30): error TS2307: Cannot find module './terms/terms.component'.

I am not sure if the code is broken or I am not building correctly if it is the former would be nice to have a better example of how to build the app in the docs.

https://github.com/GoogleCloudPlatform/professional-services/blob/master/tools/bq-visualizer/README.md

Unable to generate and publish mqtt

I'd like to offer a feedback I did not have to other users.

As I've run it in Python 3, I had to explicitly convert the iterator into a sequence (list, tuple, etc) to allow unpack it multiple times


Hi @lerrytang ,
I'm unable to generate/publish mqtt payload when running locally EnergyDisaggregationDemo_Client.ipynb even though no error message is produced.

Would you help me please? What can I do to make it work?

http://localhost:8890/notebooks/EnergyDisaggregationDemo_Client.ipynb#

before data trimming: data.shape=(432000,)
after data trimming: data.shape=(131820,)
Creating JWT using RS256 from private key file rs256.key
connected: Connection Accepted.
connected: Connection Accepted.
connected: Connection Accepted.
...

image

GCS Bucket Mover - bucket policy only

Hi,

I love the GCS Bucket Mover tool - but would it be possible to use it in combination with the bucket policy only feature? It looks like it now really depends on ACLs to exist or at least be accessible, which isn't the case when bucket policy only is activated.

Thanks,

Wietse

Verify / Update all python assets for Python3 Compatibility

This issue is to track updates to existing assets to keep them from going stale in coming python2 deprecatation.

Getting list of python assets

find ./professional-services -name *.py -type f | awk '{ FS = "/"}; {print $3 "." $4}' | uniq
  • tools.dns-sync
  • tools.bigquery-query-plan-exporter
  • tools.bqpipeline
  • tools.ml-dataprep
  • tools.asset-inventory
  • tools.gsuite-exporter
  • tools.gcs-bucket-mover
  • tools.agile-machine-learning-api
  • tools.labelmaker
  • tools.hive-bigquery
  • tools.kunskap
  • tools.cloudconnect
  • tools.site-verification-group-sync
  • tools.gce-quota-sync
  • tools.cloud-vision-utils
  • examples.cloudml-bee-health-detection
  • examples.cloudml-collaborative-filtering
  • examples.dataproc-persistent-history-server
  • examples.e2e-home-appliance-status-monitoring
  • examples.dataflow-python-examples
  • examples.cloud-composer-examples
  • examples.cloudml-churn-prediction
  • examples.bigquery-row-access-groups
  • examples.qaoa
  • examples.cloudml-sklearn-pipeline
  • examples.dataflow-data-generator
  • examples.gcf-pubsub-vm-delete-event-handler
  • examples.cloudml-fraud-detection
  • examples.bigquery-cross-project-slot-monitoring
  • examples.kubeflow-fairing-example
  • examples.gce-to-adminsdk
  • examples.kubeflow-pipelines-sentiment-analysis
  • examples.dlp
  • examples.tensorflow-profiling-examples
  • examples.cloudml-energy-price-forecasting
  • examples.cloudsql-custom-metric
  • examples.python-cicd-with-cloudbuilder
  • examples.cryptorealtime
  • examples.cloudml-sentiment-analysis
  • helpers.sort_lists.py

Error: Could not find or load main class org.apache.maven.surefire.booter.ForkedBooter

While bulding the iot-nirvana demo I stumbled upon this issue:
Error: Could not find or load main class org.apache.maven.surefire.booter.ForkedBooter

I fixed the issue as mentioned in this link:
https://stackoverflow.com/questions/53010200/maven-surefire-could-not-find-forkedbooter-class

updating the pom.xml in order to include

<project>
  [...]
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-surefire-plugin</artifactId>
        <version>2.22.1</version>
        <configuration>
          <useSystemClassLoader>false</useSystemClassLoader>
        </configuration>
      </plugin>
    </plugins>
  </build>
  [...]
</project>

direct-upload-to-gcs example not working

I've successfully deployed the direct-upload-to-gcs, however I am getting a '404 page not found' error from the service when I try to create a signed url. Are the steps provided in the example complete?

Data Generator dependencies broken in 2to3 migration

python data_generator_pipeline.py \
           --schema_file=../../bq_file_load_benchmark/json_schemas/benchmark_table_schemas/100_STRING_10.json \
           --num_records=10 \
           --output_bq_table=data-analytics-pocs:bqbml_test_staging_dataset.100_STRING_10  \
           --project=data-analytics-pocs \
           --setup_file=./setup.py \
           --staging_location=gs://bq_benchmark_dataflow_test/staging \
           --temp_location=gs://bq_benchmark_dataflow_test/temp  \
           --save_main_session \
           --worker_machine_type=n1-highcpu-32 \
           --runner=DataflowRunner

and I'm getting

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 770, in run
    self._load_main_session(self.local_staging_directory)
  File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 488, in _load_main_session
    pickler.load_session(session_file)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/internal/pickler.py", line 314, in load_session
    return dill.load_session(file_path)
  File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 368, in load_session
    module = unpickler.load()
  File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 472, in load
    obj = StockUnpickler.load(self)
  File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 462, in find_class
    return StockUnpickler.find_class(self, module, name)
  File "/usr/local/lib/python3.7/site-packages/data_generator/PrettyDataGenerator.py", line 30, in <module>
    from google.cloud import storage as gcs
  File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/__init__.py", line 39, in <module>
    from google.cloud.storage.blob import Blob
  File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 46, in <module>
    from google.resumable_media.requests import RawDownload
ImportError: cannot import name 'RawDownload' from 'google.resumable_media.requests' (/usr/local/lib/python3.7/site-packages/google/resumable_media/requests/__init__.py)

Seems to be issue with python3 and old version of beam / avro-python3

"gcloud ml-engine commands have been renamed" message on e2e-home-appliance-status-monitoring example

Hello, while following the e2e-home-appliance-status-monitoring example and got the warning below on step 1.

gcloud ml-engine models create EnergyDisaggregationModel --regions ${REGION} --project ${GOOGLE_PROJECT_ID}

WARNING: The gcloud ml-engine commands have been renamed and will soon be removed. Please use gcloud ai-platform instead.

I guess it would be better updating the readme with the new commands.

Thanks!!

Can you build multiple redis-cluster-gke with different namespaces?

Can you build multiple redis-cluster-gke with different namespaces?

MountVolume.SetUp failed for volume "redis-nodes" : configmap references non-existent config key: redis-nodes.txt

redis-config.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: redis-conf
  namespace: test
data:
  redis.conf: |
    port 6379
    cluster-enabled yes
    cluster-config-file nodes.conf
    cluster-node-timeout 5000
    appendonly yes

redis-expect.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: redis-expect
  namespace: test
data:
  redis-expect.script: |+
    #!/usr/bin/expect
    spawn /bin/bash
    expect "#" {
      send "(cat /tmp/redis-nodes/redis-nodes.txt) | (xargs -o /tmp/redis-stable/src/redis-cli --cluster create --cluster-replicas 1)\r"
    }
    expect "to accept):" {
      send "yes\r"
    }
    expect "#" {
      send "(cat /tmp/redis-nodes/redis-nodes.txt) | (while read node; do /tmp/redis-stable/src/redis-cli --cluster check \${node}; done)\r"
    }
    interact

redis-cache.yaml

apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: redis-pdb
  namespace: test
spec:
  minAvailable: 66%
  selector:
    matchLabels:
      app: redis
      redis-type: cache
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-cache
  namespace: test
spec:
  selector:
    matchLabels:
      app: redis
  replicas: 6
  template:
    metadata:
      labels:
        app: redis
        redis-type: cache
        namespace: test
    spec:
      hostNetwork: true
      nodeSelector:
        cloud.google.com/gke-nodepool:test-redis-pool
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app
                    operator: In
                    values:
                      - redis
              topologyKey: "kubernetes.io/hostname"
      containers:
        - name: redis-server
          image: "redis:4.0-alpine"
          imagePullPolicy: Always
          command:
            - "redis-server"
          args:
            - "/etc/redis/redis.conf"
            - "--protected-mode"
            - "no"
          resources:
            requests:
              cpu: "1"
              memory: "5Gi"
          ports:
            - name: redis
              containerPort: 6379
              protocol: "TCP"
            - name: redis-cluster
              containerPort: 16379
              protocol: "TCP"
          volumeMounts:
            - name: "redis-conf"
              mountPath: "/etc/redis"
      volumes:
        - name: "redis-conf"
          configMap:
            name: "redis-conf"
            items:
              - key: "redis.conf"
                path: "redis.conf"

redis-create-cluster.yaml

# Copyright 2018 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: batch/v1
kind: Job
metadata:
  name: redis-create-cluster
  namespace: test
spec:
  backoffLimit: 5
  activeDeadlineSeconds: 600
  template:
    spec:
      nodeSelector:
        cloud.google.com/gke-nodepool: test-redis-pool
      containers:
      - name: redis-cli
        image: ubuntu
        command: ["/bin/bash", "-c"]
        args: ["apt-get update && \
          DEBIAN_FRONTEND=noninteractive apt-get install -yq tzdata && \
          ln -fs /usr/share/zoneinfo/Asia/Tokyo /etc/localtime && \
          dpkg-reconfigure --frontend noninteractive tzdata && \
          DEBIAN_FRONTEND=noninteractive apt-get install -yq curl gcc make libjemalloc-dev expect && \
          cd /tmp && \
          curl -LO http://download.redis.io/redis-stable.tar.gz && \
          tar zxvf redis-stable.tar.gz && \
          cd redis-stable && \
          make distclean && \
          make && \
          expect -f /tmp/redis-expect/redis-expect.script"]
        volumeMounts:
          - name: "redis-nodes"
            mountPath: "/tmp/redis-nodes"
          - name: "redis-expect"
            mountPath: "/tmp/redis-expect"
      restartPolicy: Never
      volumes:
        - name: "redis-nodes"
          configMap:
            name: "redis-nodes"
            items:
              - key: "redis-nodes.txt"
                path: "redis-nodes.txt"
        - name: "redis-expect"
          configMap:
            name: "redis-expect"
            items:
              - key: "redis-expect.script"
                path: "redis-expect.script"

Asset inventory issue : load failed due to TIMESTAMP to STRING change ???

Hi there,
Cloud Dataflow job is failing due to this error:
sqladmin_googleapis_com_Instance. Field resource.data.scheduledMaintenance.startTime has changed type from TIMESTAMP to STRING [while running 'load_to_bigquery/load_to_bigquery']
this occurs on:

  • sqladmin_googleapis_com_Instance
  • compute_googleapis_com_RegionBackendService.
  • k8s_io_Pod
  • ...
    What can i do to correct ?
    Thanks a lot for your help

Asset Inventory tool : pipeline failed on one of the last steps

Hi guys

After I got the view permissions on gs://professional-services-tools-asset-inventory/latest/import_pipeline template, I managed to run it via GAE app.
But it still failed. You can see the screenshot 👍
Screen Shot 2019-04-30 at 13 18 39

In the pipeline options I see parameters which I guess I have to be able to change them.
Screen Shot 2019-04-30 at 13 21 13

For example temp_location, staging_location, project (bmenasha-1)
How can I change them and can it be the issue why the pipeline got failed ?

Thx

Resolve tensorflow security issue

Update requirements.txt to depend on a more recent version.

tensorflow>=1.12.1

Vulnerable versions: >= 1.0.0, < 1.12.1
Patched version: 1.12.1
NULL pointer dereference in Google TensorFlow before 1.12.2 could cause a denial of service via an invalid GIF file.

Asset Inventory tool : error during pipeline running

Hi guys
I am using the Asset Inventory tool and I got an error during pipeline running:
ImportError: No module named asset_inventory.api_schema
I tried to use save_main_session = True option but it didn't help.

What am I doing wrong?

Thx

Create Hangouts Chat Bot version

This is great. As a Enhancement can you create a version of this that using Appscript but instead of sending an email it updates a Hangouts Chat Room it has been added to.

Thanks!

Asset Inventory tool : Permission denied to template file

Hey guys,
I am trying to run Asset Inventory tool and my service account which I use can't get the template file getting error:
Template file failed to load: gs://professional-services-tools-asset-inventory/latest/import_pipeline. Permissions denied. @appspot.gserviceaccount.com does not have storage.objects.get access to professional-services-tools-asset-inventory/latest/import_pipeline

Should I create add upload the template by myself?

Fix data-analytics/iot-nirvana/client/pom.xml

There is a vulnerability in org.eclipse.paho:org.eclipse.paho.client.mqttv3.

The proposed remediation is:
Upgrade org.eclipse.paho:org.eclipse.paho.client.mqttv3 to version 1.2.1 or later.

also, please lint your java files using google-java-format.

You can use:

java -jar /usr/share/java/google-java-format-1.7-all-deps.jar -r my-file.java

on each java file to lint your code according to Google guidelines for java.

Add flake8 to CI tool to catch python errors sooner

This repo has a lot of issues w/ consistency and while some.
This will likely be a heavy lift at first but will pay dividends in the long run as the repo grows.

Let's add some more static checks:

  • flake8
  • shellcheck

Let's find a way to automate running the tests in cloud build for new assets.

Can't see bigquery table created.

I tried to run the example. but meet a couple of problems.

First, I didn't see that bigquery table created after ran command "run_oncloud.sh", although the dataflow job is created and running. Second, I didn't see and device registered in IOT registry.

The IOT registry was created manually, because I found it failed when ran command setup_gcp_environment.sh because I use region us-east1. IOT only works in us-central1. I don't know if this matters.

./setup_gcp_environment.sh iot-poc-219115 us-east1 us-east1-b ppiot iotpub1 iotsub1 ppiot

Executing gcloud beta iot registries create ppiot --region us-east1 --event-notification-config=topic=projects/iot-poc-219115/topics/iotpub1
ERROR: (gcloud.iot.registries.create) NOT_FOUND: Cloud region not supported by this service. The name 'projects/iot-poc-219115/locations/us-east1/registries/ppiot' specifies the location 'us-east1', valid cloud regions are {asia-east1,europe-west1,us-central1}.

gcloud beta iot registries create ppiot --region us-central1 --event-notification-config=topic=projects/iot-poc-219115/topics/iotpub1

It will be appreciated if you can advise how I can fix these issues.

Thanks

Larry

Start Subscriber/View

Hi,
Everything is set and EnergyDisaggregationDemo_Client is up and running, but cannot get rid of the following error (also tried to declare subscriber as global):

Would you help me pls?

Creating subscription "sub1" to topic "pred" ...
Subscription "sub1" existed.


UnboundLocalError Traceback (most recent call last)
in
6 subscription_name=SUB_NAME,
7 app_id_name_map=app_id_name_map,
----> 8 target_device=DEVICE_ID)
9 tt.async_pull_msg()

in init(self, project_id, ground_truth, topic_name, subscription_name, app_id_name_map, target_device)
22 # create subscription
23 self._subscriber, self._subscription_path = (
---> 24 self.create_subscription(project_id, topic_name, subscription_name))
25 self._subscriber.subscribe(self._subscription_path,
26 callback=self._msg_callback)

in create_subscription(self, project_id, topic_name, subscription_name)
52 except Exception as e:
53 print('Subscription "{}" existed.'.format(subscription_name))
---> 54 return subscriber, subscription_path
55
56 def async_pull_msg(self):

UnboundLocalError: local variable 'subscriber' referenced before assignment

iot-nirvana: setup_gcp_environment.sh fails

The script fails trying to create a boot disk, just before deleting the compute instance. The error description:

ERROR: (gcloud.compute.images.create) Could not fetch resource:
 - The disk resource 'projects/$MY_PROJECT/zones/us-central1-a/disks/debian9-java8-img' is already being used by 'projects/$MY_PROJECT/zones/us-central1-a/instances/debian9-java8-img'

My real project id replaced with $MY_PROJECT in this error message.

Possible Solution:

  • Update the startup_install_java8.sh script to use sudo for installing default-jre
  • poweroff command, that's in the script, didn't work when I tried it from the VM itself, by sshing into it.

Asset exporter tool - getting ImportError in GAE

Just tried to set up from scratch in new project. Followed steps from readme
When running the cron job I get this

ImportError: cannot import name 'expr_pb2' from 'google.type' (/env/lib/python3.7/site-packages/google/type/init.py)
at (/env/lib/python3.7/site-packages/google/iam/v1/policy_pb2.py:16)
at (/env/lib/python3.7/site-packages/google/iam/v1/iam_policy_pb2.py:17)
at (/env/lib/python3.7/site-packages/google/cloud/asset_v1/proto/assets_pb2.py:19)
at (/env/lib/python3.7/site-packages/google/cloud/asset_v1/proto/asset_service_pb2.py:20)
at (/env/lib/python3.7/site-packages/google/cloud/asset_v1/types.py:23)
at (/env/lib/python3.7/site-packages/google/cloud/asset_v1/init.py:20)
at (/srv/lib/asset_inventory/export.py:33)
at (/srv/main.py:45)
at import_app (/env/lib/python3.7/site-packages/gunicorn/util.py:350)
at load_wsgiapp (/env/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py:41)
at load (/env/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py:52)
at wsgi (/env/lib/python3.7/site-packages/gunicorn/app/base.py:67)
at load_wsgi (/env/lib/python3.7/site-packages/gunicorn/workers/base.py:138)
at init_process (/env/lib/python3.7/site-packages/gunicorn/workers/base.py:129)
at init_process (/env/lib/python3.7/site-packages/gunicorn/workers/gthread.py:104)
at spawn_worker (/env/lib/python3.7/site-packages/gunicorn/arbiter.py:583)

ValueError: Unable to get the Filesystem for path gs://XXXXX/XXXXX

Hi Guys,

Trying to follow the data_flow example "data_ingestion.py", but every time I run it I get the following error:

Traceback (most recent call last): File "pipeline01.py", line 115, in <module> run() File "pipeline01.py", line 97, in run | 'Write to BigQuery' >> beam.io.Write( File "/usr/local/lib/python2.7/site-packages/apache_beam/io/textio.py", line 522, in __init__ skip_header_lines=skip_header_lines) File "/usr/local/lib/python2.7/site-packages/apache_beam/io/textio.py", line 117, in __init__ validate=validate) File "/usr/local/lib/python2.7/site-packages/apache_beam/io/filebasedsource.py", line 118, in __init__ self._validate() File "/usr/local/lib/python2.7/site-packages/apache_beam/options/value_provider.py", line 124, in _f return fnc(self, *args, **kwargs) File "/usr/local/lib/python2.7/site-packages/apache_beam/io/filebasedsource.py", line 175, in _validate match_result = FileSystems.match([pattern], limits=[1])[0] File "/usr/local/lib/python2.7/site-packages/apache_beam/io/filesystems.py", line 153, in match filesystem = FileSystems.get_filesystem(patterns[0]) File "/usr/local/lib/python2.7/site-packages/apache_beam/io/filesystems.py", line 84, in get_filesystem raise ValueError('Unable to get the Filesystem for path %s' % path) ValueError: Unable to get the Filesystem for path gs://python-dataflow-example/data_files/head_usa_names.csv

The package versions I have on my machine:

  • apache-beam==2.4.0

  • google-cloud==0.32.0

  • google-cloud-storage==1.6.0

  • google-cloud-bigquery==1.1.0

  • google-cloud-dataflow==2.4.0

Couldn't spot what I'm missing or doing wrong here.

Questions and Feedback on the GSuite Exporter

Questions

  1. Are more G Suite Admin APIs supported now? The documentation indicates activity reports for admin, drive, login, mobile, and token are supported, but I see in the error message that the script expects other activities:
<HttpError 400 when requesting https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/login%20token?alt=json returned "Invalid value 'login token'. Values must match the following regular expression: '(admin)|(calendar)|(drive)|(login)|(mobile)|(token)|(groups)|(saml)|(chat)|(gplus)|(rules)|(jamboard)|(meet)|(user_accounts)|(access_transparency)'
  1. Are there example screenshots of what the exported logs would look like?

  2. If the script runner is expected to authenticate with his or her G Suite / Cloud Identity Super Admin user account, how would we run this export command periodically as a cron job via a GCE instance? Can the script runner be a service account? If so, how would the authentication process look?

  3. When I tried to use a service account as the argument for --admin-user, I got the following error:

unauthorized_client: Client is unauthorized to retrieve access tokens using this method, or client not authorized for any of the scopes requested.'

How would the proper authorization be granted to a service account we want to use?

Feedback

  1. One of the listed requirements is NOT accurate. roles/iam.tokenCreator should really be roles/iam.serviceAccountTokenCreator.

  2. This example as is would yield an error for the user, as the --applications option cannot handle space delimited words. To export additional activities, I had to run the gsuite-exporter command on individual APIs separately.

gsuite-exporter
  --credentials-path='/path/to/service/account/credentials.json'
  --admin-user='<your_gsuite_admin>@<your_domain>'
  --api='report_v1'
  --applications='login drive token'
  --project-id='<logging_project>'
  --exporter='stackdriver_exporter.StackdriverExporter'
  1. The process of configuring G Suite admin account should be better documented. I had to scour the web to find documentation on a third-party site. Give how different G Suite is from GCP, please invest the time to explain in plain English why each step in the setup is required.

  2. Is there a reason this Python package is only limited to Python 2.7? Please consider getting it working for Python 3 as well.

  3. Please explain the purpose of each parameter. The repo currently does not provide any documentation on the required parameters (--credentials-path, --admin-user, --api, --applications, --project-id, and --exporter). The lack of documentation on the parameters is especially frustrating, as a naive user would not know immediately why both --credentials-path and --admin-user are required and that the --admin-user is expected to be the G Suite / Cloud Identity Super Admin human user. Given that expectation, the README should also explain how the authentication of the Super Admin should be done. To get the script to run, I figured out that I needed to run gcloud auth login first to authenticate with my Super Admin identity. That detail should be provided in the README clearly.

Error using Cloud Asset Inventory Import To BigQuery

At Step 11 in quick start,

gcloud dataflow jobs run $JOB_NAME --gcs-location gs://professional-services-tools-asset-inventory/latest/import_pipeline --parameters="input=$BUCKET/*.json,stage=$BUCKET/stage,load_time=$LOAD_TIME,group_by=ASSET_TYPE,dataset=asset_inventory,write_disposition=WRITE_APPEND" --staging-location $BUCKET/staging

After Running this command, the dataflow console gives me these errors:

Autoscaling is enabled for job 2019-03-19_15_09_01-12592583255883649300. The number of workers will be between 1 and 1000.

Autoscaling was automatically enabled for job 2019-03-19_15_09_01-12592583255883649300.

Checking permissions granted to controller Service Account.

Staged package apache_beam-2.9.0-cp27-cp27mu-manylinux1_x86_64.whl at location 'gs://professional-services-tools-asset-inventory/export_resources_staging_location/beamapp-bmenasha-0316165513-832710.1552755313.832831/apache_beam-2.9.0-cp27-cp27mu-manylinux1_x86_64.whl' is inaccessible.

Staged package dataflow_python_sdk.tar at location 'gs://professional-services-tools-asset-inventory/export_resources_staging_location/beamapp-bmenasha-0316165513-832710.1552755313.832831/dataflow_python_sdk.tar' is inaccessible.

Staged package pickled_main_session at location 'gs://professional-services-tools-asset-inventory/export_resources_staging_location/beamapp-bmenasha-0316165513-832710.1552755313.832831/pickled_main_session' is inaccessible.

Staged package workflow.tar.gz at location 'gs://professional-services-tools-asset-inventory/export_resources_staging_location/beamapp-bmenasha-0316165513-832710.1552755313.832831/workflow.tar.gz' is inaccessible.

Workflow failed. Causes: One or more access checks for temp location or staged files failed. Please refer to other error messages for details. For more information on security and permissions, please see https://cloud.google.com/dataflow/security-and-permissions.

Cleaning up.

Worker pool stopped.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.