GithubHelp home page GithubHelp logo

isabella232 / spanner-table-copy-pipeline Goto Github PK

View Code? Open in Web Editor NEW

This project forked from cloudspannerecosystem/spanner-table-copy-pipeline

0.0 0.0 0.0 24 KB

License: Apache License 2.0

Java 100.00%

spanner-table-copy-pipeline's Introduction

Spanner Table Copy

Beam pipeline to run a query on a Spanner database and write the results to a spanner table.

This pipeline can be used for various functions to transform/update a database, without needing to be concerned about transaction mutation limitations

  • Transforming read data using SQL and writing the results back, e.g. the equivalent of the following pseudo-sql:
  INSERT OR UPDATE INTO <table> (key, value)
  (
     SELECT key, value*100 FROM <table> WHERE <condition>
  )
  • Point-in-time recovery by reading the database state at a point in the past (within the Spanner database's version retention period, and then writing it back. e.g. the equivalent of the following pseudo-sql:
  INSERT OR UPDATE INTO <table> (key, value)
  (
     SELECT key, value
     FROM <table> FOR SYSTEM_TIME AS OF <timestamp_RFC3339>
     WHERE <condition>
  )
  • Copying data from one database to another, e.g. the equivalent of the following pseudo-sql:
 INSERT OR UPDATE INTO <table@project2/instance2/database2> (key, value)
 (
    SELECT key, value FROM <table@project1/instance1/database1>
 )

Usage:

Show help text:

mvn compile exec:java -Dexec.mainClass=com.google.cloud.solutions.SpannerTableCopy
    -Dexec.args='--help=com.google.cloud.solutions.SpannerTableCopy$SpannerTableCopyOptions'

Execute the pipeline:

For example copying a table at a timestamp in the past to a different database:

mvn compile exec:java
    -Dexec.mainClass=com.google.cloud.solutions.SpannerTableCopy \
    -Pdataflow-runner
    -Dexec.args="
         --runner=DataflowRunner

         --sourceProjectId=SOURCE_PROJECT
         --sourceInstanceId=SOURCE_INSTANCE
         --sourceDatabaseId=SOURCE_DATABASE

         --sqlQuery='select * from SOURCE_TABLE'
         --readTimestamp=2022-01-01T12:00:00Z

         --destinationProjectId=DEST_PROJECT
         --destinationInstanceId=DEST_INSTANCE
         --destinationDatabaseId=DEST_DATABASE
         --destinationTable=DEST_TABLE
         --writeMode=WRITE_MODE

         --mutationReportFile=gs://BUCKET/PATH/report
         --failureLogFile=gs://BUCKET/PATH/failures
    "

More examples can be found in the SpannerTableCopyIntegrationTest

Note: take care of Bash quoting with -Dexec.args and the --sqlQuery value.

The --dryRun parameter can be used to create the mutation report file without actually writing to the database.

--writeMode values correspond to the values of the Mutation.Op enum

spanner-table-copy-pipeline's People

Contributors

nielm avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.