GithubHelp home page GithubHelp logo

ds2bq's Introduction

ds2bq

Import Datastore backup into BigQuery & Clean up old Datastore backup information.

How it works?

  1. Setup Google Cloud Storage - Object Change Notification.
  2. Setup Datastore Scheduled Backups.
  3. Receive webhook and import data to BigQuery when create backup by cron.
    • appengine(backup cron) -> GCS object (send notification by webhook) -> appengine(import into bq)
  4. Clean up backups on GCS files (by lifecycle) and meta data (on Datastore) by cron.

Setup

Coding

see example.

Prepare

Environment variables & account

We will use above environment variables. You can change it with your favorite settings.

$ SERVICE_ACCOUNT_NAME=gcs-objectchangenotification
$ APP_ID=foobar
$ BACKUP_BUCKET=foobar-datastore-backups
$ API_ENDPOINT=https://foobar.appspot.com/api/gcs/object-change-notification
$ echo ${SERVICE_ACCOUNT_NAME} ${APP_ID} ${BACKUP_BUCKET} ${API_ENDPOINT}

We will exec some commands in local machine. set up gcloud command account that uses service account.

$ gcloud auth activate-service-account ${SERVICE_ACCOUNT_NAME}@${APP_ID}.iam.gserviceaccount.com --key-file <downloaded secret key file path>
$ gcloud auth list

GCS OCN setup

https://cloud.google.com/storage/docs/object-change-notification

You MUST save the execution log.

$ gsutil acl ch -u ${APP_ID}@appspot.gserviceaccount.com:O gs://${BACKUP_BUCKET}
$ gsutil notification watchbucket ${API_ENDPOINT} gs://${BACKUP_BUCKET}
Watching bucket gs://foobar-datastore-backups/ with application URL https://foobar.appspot.com/api/gcs/object-change-notification ...
Successfully created watch notification channel.
Watch channel identifier: XXXXX
Canonicalized resource identifier: YYYYYY
Client state token: None

If you want to stop receiving, You can stop the channel.

$ gsutil notification stopchannel XXXXX YYYYYY

This parameters can't obtaine again using any command. (isn't it?)

GCS lifecycle setup

https://cloud.google.com/storage/docs/managing-lifecycles

Set up expire duration same as DatastoreManagementService#ExpireDuration (go code).

$ cat additional-settings.json
{
  "lifecycle": {
    "rule": [
      {
        "action": {
          "type": "Delete"
        },
        "condition": {
          "age": 30
        }
      }
    ]
  }
}
$ gsutil lifecycle get gs://${BACKUP_BUCKET} > bucket-lifecycle.json
# merge JSON manually
$ gsutil lifecycle set bucket-lifecycle.json gs://${BACKUP_BUCKET}

ds2bq's People

Contributors

vvakame avatar sinmetal avatar drillbits avatar favclip-develop avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.