GithubHelp home page GithubHelp logo

dynamodb-replicator's Introduction

dynamodb-replicator

dynamodb-replicator offers several different mechanisms to manage redundancy and recoverability on DynamoDB tables.

  • A replicator function that processes events from a DynamoDB stream, replaying changes made to the primary table and onto a replica table. The function is designed to be run as an AWS Lambda function.
  • An incremental backup function that processes events from a DynamoDB stream, replaying them as writes to individual objects on S3. The function is designed to be run as an AWS Lambda function.
  • A consistency check script that scans the primary table and checks that each individual record in the replica table is up-to-date. The goal is to double-check that the replicator is performing as is should, and the two tables are completely consistent.
  • A table dump script that scans a single table, and writes the data to a file on S3, providing a snapshot of the table's state.
  • A snapshot script that scans an S3 folder where incremental backups have been made, and writes the aggregate to a file on S3, providing a snapshot of the backup's state.

Design

Managing table redundancy and backups involves many moving parts. Please read DESIGN.md for an in-depth explanation.

Utility scripts

dynamodb-replicator provides several CLI tools to help manage your DynamoDB table.

diff-record

Given two tables and an item's key, this script looks up the record in both tables and checks for consistency.

$ npm install -g dynamodb-replicator
$ diff-record --help

Usage: diff-record <primary region/table> <replica region/table> <key>

# Check for discrepancies between an item in two tables
$ diff-record us-east-1/primary eu-west-1/replica '{"id":"abc"}'

diff-tables

Given two tables and a set of options, performs a complete consistency check on the two, optionally repairing records in the replica table that differ from the primary.

$ npm install -g dynamodb-replicator
$ diff-tables --help

Usage: diff-tables primary-region/primary-table replica-region/replica-table

Options:
  --repair     perform actions to fix discrepancies in the replica table
  --segment    segment identifier (0-based)
  --segments   total number of segments
  --backfill   only scan primary table and write to replica

# Log information about discrepancies between the two tables
$ diff-tables us-east-1/primary eu-west-2/replica

# Repair the replica to match the primary
$ diff-tables us-east-1/primary eu-west-2/replica --repair

# Only backfill the replica. Useful for starting a new replica
$ diff-tables us-east-1/primary eu-west-2/new-replica --backfill --repair

# Perform one segment of a parallel scan
$ diff-tables us-east-1/primar eu-west-2/replica --repair --segment 0 --segments 10

replicate-record

Given two tables and an item's key, this script insures that the replica record is synchronized with its current state in the primary table.

$ npm install -g dynamodb-replicator
$ replicate-record --help

Usage: replicate-record <primary tableinfo> <replica tableinfo> <recordkey>
 - primary tableinfo: the primary table to replicate from, specified as `region/tablename`
 - replica tableinfo: the replica table to replicate to, specified as `region/tablename`
 - recordkey: the key for the record specified as a JSON object

# Copy the state of a record from the primary to the replica table
$ replicate-record us-east-1/primary eu-west-1/replica '{"id":"abc"}'

backup-table

Scans a table and dumps the entire set of records as a line-delimited JSON file on S3.

$ npm install -g dynamodb-replicator
$ backup-table --help

Usage: backup-table region/table s3url

Options:
  --jobid      assign a jobid to this backup
  --segment    segment identifier (0-based)
  --segments   total number of segments
  --metric     cloudwatch metric namespace. Will provide dimension TableName = the name of the backed-up table.

# Writes a backup file to s3://my-bucket/some-prefix/<random string>/0
$ backup-table us-east-1/primary s3://my-bucket/some-prefix

# Specifying a jobid guarantees the S3 location
# Writes a backup file to s3://my-bucket/some-prefix/my-job-id/0
$ backup-table us-east-1/primary s3://my-bucket/some-prefix --jobid my-job-id

# Perform one segment of a parallel backup
# Writes a backup file to s3://my-bucket/some-prefix/my-job-id/4
$ backup-table us-east-1/primary s3://my-bucket/some-prefix --jobid my-job-id --segment 4 --segments 10

incremental-backfill

Scans a table and dumps each individual record as an object to a folder on S3.

$ npm install -g dynamodb-replicator
$ incremental-backfill --help

Usage: incremental-backfill region/table s3url

# Write each item in the table to S3. `s3url` should provide any desired bucket/prefix.
# The name of the table will be appended to the s3 prefix that you provide.
$ incremental-backfill us-east-1/primary s3://dynamodb-backups/incremental

incremental-snapshot

Reads each item in an S3 folder representing an incremental table backup, and writes an aggregate line-delimited JSON file to S3.

$ npm install -g dynamodb-replicator
$ incremental-snapshot --help

Usage: incremental-snapshot <source> <dest>

Options:
  --metric     cloudwatch metric region/namespace/tablename. Will provide dimension TableName = the tablename.

# Aggregate all the items in an S3 folder into a single snapshot file
$ incremental-snapshot s3://dynamodb-backups/incremental/primary s3://dynamodb-backups/snapshots/primary

incremental-diff-record

Checks for consistency between a DynamoDB record and its backed-up version on S3.

$ npm install -g dynamodb-replicator
$ incremental-diff-record --help

Usage: incremental-diff-record <tableinfo> <s3url> <recordkey>
 - tableinfo: the table where the record lives, specified as `region/tablename`
 - s3url: s3 folder where the incremental backups live
 - recordkey: the key for the record specified as a JSON object

# Check that a record is up-to-date in the incremental backup
$ incremental-diff-record us-east-1/primary s3://dynamodb-backups/incremental '{"id":"abc"}'

incremental-backup-record

Copies a DynamoDB record's present state to an incremental backup folder on S3.

$ npm install -g dynamodb-replicator
$ incremental-backup-record --help

Usage: incremental-backup-record <tableinfo> <s3url> <recordkey>
 - tableinfo: the table to backup from, specified as `region/tablename`
 - s3url: s3 folder into which the record should be backed up to
 - recordkey: the key for the record specified as a JSON object

# Backup a single record to S3
$ incremental-backup-record us-east-1/primary s3://dynamodb-backups/incremental '{"id":"abc"}'

incremental-record-history

Prints each version of a record that is available in an incremental backup folder on S3.

$ incremental-record-history --help

Usage: incremental-record-history <tableinfo> <s3url> <recordkey>
 - tableinfo: the table where the record lives, specified as `region/tablename`
 - s3url: s3 folder where the incremental backups live. Table name will be appended
 - recordkey: the key for the record specified as a JSON object

# Read the history of a single record
$ incremental-record-history us-east-1/my-table s3://dynamodb-backups/incremental '{"id":"abc"}'

dynamodb-replicator's People

Contributors

freenerd avatar immad-imtiaz avatar kritchie avatar mapbox-re-oa avatar mick avatar prince6635 avatar rclark avatar ryndaniels avatar shwetha-mc avatar tmcw avatar willwhite avatar zakcrawford avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dynamodb-replicator's Issues

Cannot restore table backup done using dynamodb-replicator

I have executed command:
backup-table eu-west-1/ARTIFICIAL_APPLICANT_ID s3://some-s3

backup file is on the s3 but contains binary data not json

when I run
s3print s3://some-s3/ull/bba24099b07f53f5/0 | dyno import eu-west-1/TMP_ATSI_ARTIFICIAL_APPLICANT_ID

it says:
undefined:1

^

SyntaxError: Unexpected token in JSON at position 0
at JSON.parse ()
at Function.module.exports.deserialize (C:\Users\kamil.topolewski\AppData\Roaming\npm\node_modules\dyno\lib\serialization.js:49:18)
at Transform.Parser.parser._transform (C:\Users\kamil.topolewski\AppData\Roaming\npm\node_modules\dyno\bin\cli.js:94:25)
at Transform._read (_stream_transform.js:186:10)
at Transform._write (_stream_transform.js:174:12)
at doWrite (_stream_writable.js:387:12)
at writeOrBuffer (_stream_writable.js:373:5)
at Transform.Writable.write (_stream_writable.js:290:11)
at Stream.ondata (internal/streams/legacy.js:16:26)
at emitOne (events.js:116:13)
events.js:183
throw er; // Unhandled 'error' event
^

Error: write EPIPE
at _errnoException (util.js:1024:11)
at Socket._writeGeneric (net.js:767:25)
at Socket._write (net.js:786:8)
at doWrite (_stream_writable.js:387:12)
at writeOrBuffer (_stream_writable.js:373:5)
at Socket.Writable.write (_stream_writable.js:290:11)
at Socket.write (net.js:704:40)
at PassThrough.ondata (_stream_readable.js:639:20)
at emitOne (events.js:116:13)
at PassThrough.emit (events.js:211:7)

Even more logging

Debugging failed lambda invocations is a difficult thing. Some insights on what we need to log better:

Replication function

  • We will only run one replication per key, even if that key is affected more than once. We need a list of the unique keys, and a cross-checkable list of the keys that were affected
  • Keep a running count of number of records that have been successfully replicated for quick and easy comparison

Backup function

  • We run each change, even if that means running more than once per unique key. This means we need a list of each change-key combo, cross-checkable against the changes that have been implemented
  • I'm wondering if we should try and use a setTimeout to actually print a list of changes that failed to be implemented in 58s or something.
  • consider, for each change/key combination in an invocation, logging an md5sum of it. Its difficult to search cloudwatch logs for JSON objects, which is what you'd like to to in order to confirm that a change/key was retried.

Underscore .isEqual fails for buffers

This equality check is inadequate for comparing two JavaScript objects that may include buffers. This leads to false-positive different-in-replica reports.

》node
> var u = require('underscore');
undefined
> var a = { hello: new Buffer('world') };
undefined
> var b = { hello: new Buffer('world') };
undefined
> u.isEqual(a, b)
false
> var assert = require('assert')
undefined
> assert.deepEqual(a, b)
undefined
> 

Not usable tool

It's nice that you have implemented such tool but there is not way to restore table from backup.

Questions:

  • why there is no restore option in dynamodb-replicator?
  • when I try to restore using dyno (which I assume can read these backups) I end up with error:

s3print s3://some-s3/ull/bba24099b07f53f5/0 | dyno put eu-west-1/TMP_ATSI_ARTIFICIAL_APPLICANT_ID
undefined:1

^

SyntaxError: Unexpected token in JSON at position 0
at JSON.parse ()
at Function.module.exports.deserialize (C:\Users\kamil.topolewski\AppData\Roaming\npm\node_modules\dyno\lib\serialization.js:49:18)
at Transform.Parser.parser._transform (C:\Users\kamil.topolewski\AppData\Roaming\npm\node_modules\dyno\bin\cli.js:94:25)
at Transform._read (_stream_transform.js:186:10)
at Transform._write (_stream_transform.js:174:12)
at doWrite (_stream_writable.js:387:12)
at writeOrBuffer (_stream_writable.js:373:5)
at Transform.Writable.write (_stream_writable.js:290:11)
at Stream.ondata (internal/streams/legacy.js:16:26)
at emitOne (events.js:116:13)
events.js:183
throw er; // Unhandled 'error' event
^

Error: write EPIPE
at _errnoException (util.js:1024:11)
at Socket._writeGeneric (net.js:767:25)
at Socket._write (net.js:786:8)
at doWrite (_stream_writable.js:387:12)
at writeOrBuffer (_stream_writable.js:373:5)
at Socket.Writable.write (_stream_writable.js:290:11)
at Socket.write (net.js:704:40)
at PassThrough.ondata (_stream_readable.js:639:20)
at emitOne (events.js:116:13)
at PassThrough.emit (events.js:211:7)

No commit statuses or check runs found!

👋 Hey there! It's Changebot, and I help repositories follow our engineering best practices. My magic wand found some things I wanted to highlight for your review:

Item Current status Best practice guidelines
Number of status checks at time of merging 0 >= 1

Could not find any status checks for this PR: #108

Can you take a look at these best practices and make any adjustments if needed?

Please visit my status check docs if you have any questions.

incremental-snapshot creates extra output in snapshot

bin/incremental-snapshot.js s3://$BackupBucket/$BackupPrefix/$TABLE s3://$BackupBucket/${TABLE}-snapshot

sometimes this leaves empty lines in the snapshot output - for example:

aws s3 cp s3://$BackupBucket/${TABLE}-snapshot - | gzcat

{"what":{"S":"new1"},"a":{"S":"b"}}
{"what":{"S":"new2"},"a":{"S":"11"}}
{"what":{"S":"a"},"b":{"S":"ccd"}}
{"what":{"S":"test2"},"a":{"S":"asdf"}}
{"what":{"S":"new10"},"a":{"S":"b"}}
{"what":{"S":"sdfg"},"a":{"S":"asdf"}}
{"what":{"S":"asdf"},"aa":{"S":"bb"}}
{"what":{"S":"test"},"a":{"S":"test1"}}

{"what":{"S":"new"},"a":{"S":"fish faster 8"}}
{"what":{"S":"test4"}}
{"what":{"S":"b"},"a":{"S":"aa"},"b":{"S":"cc"}}

This one, anyway, is easy enough to handle during the uncompress ( | gzip | grep -v "^$) - but...

removed other case, found where that data came from and it was on me.

Error Running diff-tables with --repair or --backfill

Hi,

First, thanks for a great tool. Provides a viable alternative to the relative black box that is the official AWS solution.

I have configured Dynamo replication using the replicator function with Lambda, but was keen to use the diff-tables script to attain a bit of confidence in what was being replicated. Unfortunately it fails whenever I attempt to pass the --repair or --backfill flag (i.e. to actually make any changes).

The stack trace is as follows:

/usr/local/lib/node_modules/dynamodb-replicator/node_modules/aws-sdk/lib/request.js:30
            throw err;
                  ^
TypeError: Object.keys called on non-object
    at Function.keys (native)
    at Response.<anonymous> (/usr/local/lib/node_modules/dynamodb-replicator/diff.js:206:32)
    at Request.<anonymous> (/usr/local/lib/node_modules/dynamodb-replicator/node_modules/aws-sdk/lib/request.js:353:18)
    at Request.callListeners (/usr/local/lib/node_modules/dynamodb-replicator/node_modules/aws-sdk/lib/sequential_executor.js:105:20)
    at Request.emit (/usr/local/lib/node_modules/dynamodb-replicator/node_modules/aws-sdk/lib/sequential_executor.js:77:10)
    at Request.emit (/usr/local/lib/node_modules/dynamodb-replicator/node_modules/aws-sdk/lib/request.js:595:14)
    at Request.transition (/usr/local/lib/node_modules/dynamodb-replicator/node_modules/aws-sdk/lib/request.js:21:10)
    at AcceptorStateMachine.runTo (/usr/local/lib/node_modules/dynamodb-replicator/node_modules/aws-sdk/lib/state_machine.js:14:12)
    at /usr/local/lib/node_modules/dynamodb-replicator/node_modules/aws-sdk/lib/state_machine.js:26:10
    at Request.<anonymous> (/usr/local/lib/node_modules/dynamodb-replicator/node_modules/aws-sdk/lib/request.js:37:9)

Any input would be appreciated!

dyno failing to restore incremental backup?

simple test case with a small table:

ENV=dev
TABLE=dsrtest2

. config.env.$ENV

bin/incremental-backfill.js $AWS_REGION/$TABLE s3://$BackupBucket/$BackupPrefix

bin/incremental-snapshot.js s3://$BackupBucket/$BackupPrefix/$TABLE  s3://$BackupBucket/${TABLE}-snapshot

s3print s3://$BackupBucket/${TABLE}-snapshot | dyno put $AWS_REGION/dsr-test-restore-$TABLE

%% sh test-backup.sh
12 - 11.89/s[Fri, 09 Dec 2016 23:54:59 GMT] [info] [incremental-snapshot] Starting snapshot from s3://dsr-ddb-rep-testing/testprefix/dsrtest2 to s3://dsr-ddb-rep-testing/dsrtest2-snapshot
[Fri, 09 Dec 2016 23:55:01 GMT] [info] [incremental-snapshot] Starting upload of part #0, 0 bytes uploaded, 12 items uploaded @ 6.26 items/s
[Fri, 09 Dec 2016 23:55:01 GMT] [info] [incremental-snapshot] Uploaded snapshot to s3://dsr-ddb-rep-testing/dsrtest2-snapshot
[Fri, 09 Dec 2016 23:55:01 GMT] [info] [incremental-snapshot] Wrote 12 items and 148 bytes to snapshot
undefined:1
�
^

SyntaxError: Unexpected token  in JSON at position 0
    at Object.parse (native)
    at Function.module.exports.deserialize (/Users/draistrick/git/github/dynamodb-replicator/node_modules/dyno/lib/serialization.js:49:18)
    at Transform.Parser.parser._transform (/Users/draistrick/git/github/dynamodb-replicator/node_modules/dyno/bin/cli.js:94:25)
    at Transform._read (_stream_transform.js:167:10)
    at Transform._write (_stream_transform.js:155:12)
    at doWrite (_stream_writable.js:307:12)
    at writeOrBuffer (_stream_writable.js:293:5)
    at Transform.Writable.write (_stream_writable.js:220:11)
    at Stream.ondata (stream.js:31:26)
    at emitOne (events.js:96:13)

Next step would be to diff the two tables - but the pipe to dyno fails. I've tried 1.0.0 and 1.3.0 with the same result.

What data format is dyno expecting? The file on s3 (tried multiple tables including real data tables) is a binary blob?

cheese:~%% aws --region=us-west-2 s3 cp s3://dsr-ddb-rep-testing/dsrtest-snapshot -
m�1�
��ߠl�EG�EB�uL0\�Tuq�ݵ#������$L�6�/8�%Z�r�[d�p
���5h)��X�ֻ�j�ƪ�
 ۘ��&�WJ'❑��`�T��������􁒷
cheese:~%%

So maybe this is a problem with backfill? or I'm missing something? :)

2016-12-09 18:54:35 149 dsrtest-snapshot
2016-12-09 18:55:01 148 dsrtest2-snapshot
2016-12-09 18:37:20 1428 receipt_log_dev-01-snapshot
2016-12-09 18:53:15 13457328 showdownlive_dev-01-snapshot

dynamodb-incremental backup

Hey,

Thanks a lot for creating this utility, it would be really helpful, if you go can go through below steps and let me know what I am missing.

I have configured the utility for taking the incremental backup for dynamodb table, I am not sure the series of steps required for successfully implement incremental backup, below are the steps I followed -

  1. execute incremental-backfill, which created the single file for all items in table in s3 location.
  2. enabled version control for the s3 bucket location.
  3. enabled streams on dynamodb table.
  4. created lambda function for capturing the update/delete/insert from the stream for the dynamodb table.
  5. performed updates on few items in the table.
  6. executed incremental-backfill, to take the backup again.

While executing step 6, all the items were backed again while only updated items should have been backed up.

I am not sure what should be the next step for successful implementation of the utility.

incremental-snapshot doesnt handle s3 timeouts well

incremental-snapshot.js doesnt seem to handle s3 timeouts very well - leaving a broken (partial, missing, or otherwise) snapshot in it's wake:

bin/incremental-snapshot.js s3://$BackupBucket/$BackupPrefix/$TABLE s3://$BackupBucket/${TABLE}-snapshot

[Tue, 10 Jan 2017 17:12:49 GMT] [info] [incremental-snapshot] Starting snapshot from s3://dsr-ddb-rep-testing/testprefix/showdownlive_gamedata_dev-01 to s3://dsr-ddb-rep-testing/showdownlive_gamedata_dev-01-snapshot
[Tue, 10 Jan 2017 17:12:59 GMT] [info] [incremental-snapshot] Starting upload of part #0, 0 bytes uploaded, 3000 items uploaded @ 297.65 items/s
[Tue, 10 Jan 2017 17:13:06 GMT] [error] [incremental-snapshot] TimeoutError: Connection timed out after 1000ms
    at ClientRequest.<anonymous> (/Users/draistrick/git/github/dynamodb-replicator/node_modules/aws-sdk/lib/http/node.js:56:34)
    at ClientRequest.g (events.js:286:16)
    at emitNone (events.js:86:13)
    at ClientRequest.emit (events.js:185:7)
    at TLSSocket.emitTimeout (_http_client.js:614:10)
    at TLSSocket.g (events.js:286:16)
    at emitNone (events.js:91:20)
    at TLSSocket.emit (events.js:185:7)
    at TLSSocket.Socket._onTimeout (net.js:333:8)
    at tryOnTimeout (timers.js:228:11)
    message: Connection timed out after 1000ms
    code: NetworkingError
    region: us-west-2
    hostname: dsr-ddb-rep-testing.s3-us-west-2.amazonaws.com

this case also exits 0, instead of with an error...so hard to handle externally

dynamodb-replicator needs a team as a repo admin

👋 Hey there! It's Changebot, and I help repositories follow our engineering best practices. My magic wand found some things I wanted to highlight for your review:

Item Current status Best practice guidelines
Teams enabled on dynamodb-replicator None Your team

To follow least privilege best practices, please add your team as the repo admin.

Can you take a look at these best practices and make any adjustments if needed?

Please tag @mapbox/security-and-compliance on this issue if you have any questions

Comparison of serialized features

mapbox/dyno#86 is exposing a Dyno.serialize() function that should better support the "new" dynamodb data types. We should use the string result of this function to compare two objects instead of an assert.deepEqual

Lets branch off from #21 to try this out.

cc @jakepruitt

Cross-Account Replication

Hello,

Do you have any plans to include cross-account replication between DynamoDB tables?

Thanks!
Pierre

Replicate to a table with a different key schema

A bit of an unusual use case but in trying to replicate to destination table with an different key schema than the source table, this line is problematic. We are assuming both tables have identical key schema.

We could workaround this by allowing the user to define the destination key schema explicitly or by looking up the key schema for the destination table.

CLI tool to replicate a single record

If a record is identified as out-of-sync between the primary and replica tables, it would be convenient to be able to run a CLI command to bring them in sync.

Throughput Exceptions

Hi guys,

I've gotten some throughput exceeded exceptions which are fine because I was provisioning my tables with a very low number, but is there some kind of exponential backoff feature in the tool ? ( I didn't see any, maybe I looked wrong)

Also, is there a way you guys limit the capacity used by the tool ?

Any suggestion welcome !

Thank you

incremental backup and incremental backfill generate different file names

Hi there!

First off, great library. It's super useful and a much better/simpler option (for me) than the whole EMR/Datapipeline situation.

I have this simple lambda function that is subscribed to the tables I want to update:
(the bucket, region, and prefix are set as env variables in the lambda function)

var replicator = require('dynamodb-replicator')
module.exports.streaming = (event, context, callback) => {
  return replicator.backup(event, callback)
}

Then I ran the backfill by importing dynamodb-replicator/s3-backfill and passing it a config object.

However, I noticed that when records get updated via the stream/lambda function, they are written to a different file from the one created by the backfill.

I see that the formula for generating filenames is slightly different.

\\backfilll
            var id = crypto.createHash('md5')
                .update(Dyno.serialize(key))
                .digest('hex');

\\backup
            var id = crypto.createHash('md5')
                .update(JSON.stringify(change.dynamodb.Keys))
                .digest('hex');

https://github.com/mapbox/dynamodb-replicator/blob/master/s3-backfill.js#L46-L48
https://github.com/mapbox/dynamodb-replicator/blob/master/index.js#L130-L132

Does this make any practical difference? Should the restore function work regardless?

Restore incremental backups?

Love the incremental backups to S3 that works really well.

What do you use for restoring these incremental backups?

i.e - Latest copy from S3 restored to DynamoDb,
or
Restoring a point in time from S3 to DynamoDb.

Documentation: real world user documentation

would it be possible to get some real world complete setup examples for using this tool?

Before I go further - I appreciate the hard work involved in getting this tool this far, and don't take my critical comments below in the wrong context. I'm trying to help improve the user experience for possibly the best and only complete replication and backup/restore tool for dynamo that exists today!! :)

The current documentation is just a marketing glossy - a user doesn't even have a feature-to-tool map. Where is "A replicator function that processes events from a DynamoDB stream" ? What file? I have to go become an expert in node, lambda, ddb streams, streambot to be able to consume this project.

A more complete walkthrough (even if it's only an example setup) would be great - aws cli, aws web console, just something to help a user understand all of the pieces required (and to know how to skip the pieces that are not required).

How do I setup/config DDB's streams for the purpose of using this tool?

How do I setup lambda?

IAM as it applies to the tool and relate services?

s3 as it relates to the tool? (specific bucket setup requirements for this usage?)

Other areas of concern - How does using the tool for replication, and for backups, impact ddb scaling in various scenarios?

thanks - I'd love to use it, but trying to figure out how to use this is going to be a huge undertaking and trial/error event.....

Even just a high level sketch of the pieces, without tons of detail, would be a great starting point for us to contribute to.

Trouble setting up the Lambda Function

Hey,

Thanks for open sourcing this great resource.

I have been trying to set it up but I have the error below when setting the replicator as a Lambda function.

Would you be able to take a look and see if its something obvious :-)

I have manually set up the following Environmental variables like so:
process.env.ReplicaTable = "testTableReplica",
process.env.ReplicaRegion = "us-west-2"
process.env.ReplicaEndpoint = "https://dynamodb.us-west-2.amazonaws.com"

Are there any others I have missed?

I call index.replicate as the function name

I have tried diff-tables us-west-2/testTable us-west-2/testTableReplica --backfill and that worked without a hitch, so I am certain its not a difference in the tables etc

I am looking into using Streambot for real deployment which looks sweet as it removes the configuration from the code entirely. I figured the best way was to get basic example up and running first, then translate that into the streambot js

2015-12-04T09:16:14.357Z    8ff15f9e-ca79-478b-9794-1e528a76d52a    [failed-request] request-id: undefined | id-2: undefined | params:
{
    "RequestItems": {
        "testTableReplica": [
            {
                "PutRequest": {
                    "Item": {
                        "Id": {
                            "S": "ddddddd"
                        }
                    }
                }
            },
            {
                "PutRequest": {
                    "Item": {
                        "Id": {
                            "S": "hjhgfhgfhg"
                        }
                    }
                }
            }
        ]
    }
}

Cheers
Adrian

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.