GithubHelp home page GithubHelp logo

tbarbugli / cassandra_snapshotter Goto Github PK

View Code? Open in Web Editor NEW
223.0 223.0 123.0 200 KB

A tool to backup cassandra nodes using snapshots and incremental backups on S3

License: Other

Python 100.00%

cassandra_snapshotter's People

Contributors

arikfr avatar bart613 avatar bitdeli-chef avatar chrislovecnm avatar cunningbaldrick avatar dagvl avatar fatelei avatar gdhagger avatar jamesrwhite avatar jyotman avatar matkam avatar pauloricardomg avatar qconner avatar rhardouin avatar sirio7g avatar sppaikra28 avatar tbarbugli avatar tonylixu avatar winks avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cassandra_snapshotter's Issues

pkg_resources.DistributionNotFound: boto>=2.29.1

I installed on datastax AMI using the directions in README. Here's the error:

$ cassandra-snapshotter --aws-access-key-id="**********" --aws-secret-access-key="**********" --s3-bucket-name="*********" --s3-bucket-region="us-west-2" --s3-base-path="" backup --hosts=cassandra

cassandra] Executing task 'node_start_backup'
[cassandra] Executing task 'upload_node_backups'
[cassandra] sudo: cassandra-snapshotter-agent  create-upload-manifest --manifest_path=/tmp/backupmanifest --snapshot_name=20140918010604 --snapshot_keyspaces= --snapshot_table= --data_path=/var/lib/cassandra/data/
[cassandra] out: Traceback (most recent call last):
[cassandra] out:   File "/usr/local/bin/cassandra-snapshotter-agent", line 5, in <module>
[cassandra] out:     from pkg_resources import load_entry_point
[cassandra] out:   File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 2711, in <module>
[cassandra] out:     parse_requirements(__requires__), Environment()
[cassandra] out:   File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 584, in resolve
[cassandra] out:     raise DistributionNotFound(req)
[cassandra] out: pkg_resources.DistributionNotFound: boto>=2.29.1
[cassandra] out:

Fatal error: sudo() received nonzero return code 1 while executing!

Support for PyChef?

We're using Chef and PyChef to integrate it with Fabric. If we add integrated support for PyChef in cassandra_snapshotter, is it something you will merge in?

The integration will be by passing Chef search string instead of nodes list when starting the tool. For example --chef instead of --nodes.

Improve documentation on ssh key usage

Hi all

I am looking for better documentation on how to just use ssh keys, and no password. I am getting an error from fabric that sudo requires a password. Do I need to setup sudo on the remote boxes to all to run the backup w/o password?

Thanks

Chris

Pickling error with LZO

I get the following error from each of my nodes:

[10.x.y.z] out: lzop 1.03
[10. x.y.z] out: LZO library 2.06
[10. x.y.z] out: Copyright (C) 1996-2010 Markus Franz Xaver Johannes Oberhumer
[10. x.y.z] out: Exception in thread Thread-2:
[10. x.y.z] out: Traceback (most recent call last):
[10. x.y.z] out:   File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
[10. x.y.z] out:     self.run()
[10. x.y.z] out:   File "/usr/lib/python2.7/threading.py", line 504, in run
[10. x.y.z] out:     self.__target(*self.__args, **self.__kwargs)
[10. x.y.z] out:   File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in _handle_tasks
[10. x.y.z] out:     put(task)
[10. x.y.z] out: PicklingError: Can't pickle <type '_hashlib.HASH'>: attribute lookup _hashlib.HASH failed

And then snapshot seems painfully slow - no quit, processes running (though only using about 1% CPU) and 2h+ later, 9GB of data not completed snapshot.

Add upload rate limiting

Should be really easy to do it (ps seems the right tool for the job).

  1. parse the option (eg. 100MB/s 10Mb/s 1Gb/s)
  2. append the limiter to the existing lzop pipe

backup failure

Hi,
I'm running the latest version on ubuntu 14 and c* 2.0.8.
i run the command:
ubuntu@cas1:~$ cassandra-snapshotter --aws-access-key-id XXXXXXXXXX --aws-secret-access-key XXXXXXXXXX--s3-bucket-name anodot-cas-backup-staging --s3-base-path monitoring backup --host=cas1 --keyspace=combined --table=definition
WARNING:root:Response: u'Access Denied' manifest_path: u'monitoring/20150621121914//manifest.json'
WARNING:root:Response: u'Access Denied' manifest_path: u'monitoring/20150621123553//manifest.json'
WARNING:root:Response: u'Access Denied' manifest_path: u'monitoring/20150621123751//manifest.json'
WARNING:root:Response: u'Access Denied' manifest_path: u'monitoring/20150621123806//manifest.json'
WARNING:root:Response: u'Access Denied' manifest_path: u'monitoring/20150621124654//manifest.json'
WARNING:root:Response: u'Access Denied' manifest_path: u'monitoring/20150621124849//manifest.json'
WARNING:root:Response: u'Access Denied' manifest_path: u'monitoring/20150621130515//manifest.json'
[cas1] Executing task 'node_start_backup'
[cas1] Executing task 'upload_node_backups'
[cas1] sudo: cassandra-snapshotter-agent create-upload-manifest --manifest_path=/tmp/backupmanifest --snapshot_name=20150621145211 --snapshot_keyspaces=combined --snapshot_table=definition --data_path=/var/lib/cassandra/data/
[cas1] sudo: cassandra-snapshotter-agent put --aws-access-key-id=XXXXXXXX --aws-secret-access-key=XXXXXXX--s3-bucket-name=anodot-cas-backup-staging --s3-bucket-region=us-east-1 --s3-base-path=monitoring/20150621145211/cas1 --manifest=/tmp/backupmanifest --concurrency=4
[cas1] out: lzop 1.03
[cas1] out: LZO library 2.06
[cas1] out: Copyright (C) 1996-2010 Markus Franz Xaver Johannes Oberhumer
[cas1] out:

[cas1] Executing task 'clear_node_snapshot'
[cas1] sudo: /usr/bin/nodetool clearsnapshot -t "20150621145211"
[cas1] out: xss = -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1024M -Xmx1024M -Xmn200M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
[cas1] out: Requested clearing snapshot for: all keyspaces
[cas1] out:

[cas1] sudo: /usr/bin/nodetool ring

on the S3 i'm unable to see the backup directory

Thanks in advance

Shay.

please generate keyspaces and tables when restoring

When restoring a keyspace, it doesn't get created before loading the sstables, which generates the following error:

INFO:root:invoking: sstableloader --nodes localhost -v MY_KEYSPACE/MY_TABLE
Could not retrieve endpoint ranges:
InvalidRequestException(why:No such keyspace: MY_KEYSPACE)
Run with --debug to get full stack trace or --help to get help.

Even after creating the keyspace, the tables don't get created:
...
Skipping file IP_MY_KEYSPACE-MY_TABLE-jb-42-Data.db: column family MY_KEYSPACE.MY_TABLE doesn't exist
...

Nodetool Authentication

Hello,

This tool looks great so far. I notice that I can set the nodetool path, so I tell the script where to find nodetool. However, if the JMX setup has authentication on it, nodetool will fail (as it needs -u and -pw params). Does this tool take that into account?

Strange boto error on `upload_node_backups`

Hi,
I just started using cassandra_snapshotter and I got a similar error twice when running it on my cluster (4 nodes, each seems to have ~18GB of data)

[172.31.10.42] out: boto.exception.S3ResponseError: S3ResponseError: 200 OK

Full log of the run is here

Here are outputs of s3cmd du (converted to GB) on the path/CLUSTERNAME/IP folders in s3, as you can see, the 1st pair is missing some data while on my first try (2nd pair) it's not so obvious.

17.74581958632916212081 run 1, node 1
12.03844990301877260208 run2

17.30710690561681985855 run 1, node 2
17.32890863530337810516 run 2

18.47752564307302236557 run 1, node 3
18.50679215136915445327 run 2

17.72296549659222364425 run 1, node 4
17.74818217754364013671 run 2

TLDR: I think either there's something wrong with S3 atm, or cassandra_snapshotter should really retry to upload the snapshot if there's a boto error...

Not sure if this issue is related.

add a command to check for common issues

Some commands are run in parallel against multiple nodes; this sometimes makes debugging simple issues quite complex (especially for non python folks).

It would be great to have a validate command that spots the most common configuration mistakes:

. broken piramiko (eg. missing python dev libs)
. user needs to be in the sudoers group (with no pass prompt)
. missing lzop
. misconfigured S3 bucket / invalid keys

snapshots fail with permission denied.

The first time I run the tool, it works well, the subsequent times it gives an error:

[cassandrahost] out: Traceback (most recent call last):
[cassandrahost] out: File "/usr/local/bin/cassandra-snapshotter-agent", line 9, in
[cassandrahost] out: load_entry_point('cassandra-snapshotter==0.4.0', 'console_scripts', 'cassandra-snapshotter-agent')()
[cassandrahost] out: File "/usr/local/lib/python2.7/dist-packages/cassandra_snapshotter/agent.py", line 266, in main
[cassandrahost] out: args.incremental_backups
[cassandrahost] out: File "/usr/local/lib/python2.7/dist-packages/cassandra_snapshotter/agent.py", line 151, in put_from_manifest
[cassandrahost] out: os.remove(f)
[cassandrahost] out: OSError: [Errno 13] Permission denied: '/mnt/cassandra/data/system/schema_triggers-0359bc7171233ee19a4ab9dfb11fc125/backups/la-4-big-Filter.db'

Are these file in the "backups" directory above, something which is leftover from before? There are a bunch of such files and it keep failing on them. The permissions on these files are:

ls -l /mnt/cassandra/data/system/schema_triggers-0359bc7171233ee19a4ab9dfb11fc125/backups/la-4-big-Filter.db
-rw-r--r-- 1 cassandra cassandra 16 Feb 7 15:00 /mnt/cassandra/data/system/schema_triggers-0359bc7171233ee19a4ab9dfb11fc125/backups/la-4-big-Filter.db

I am running the snapshotted from my mac which SSH's into the node, using the ubuntu user. Any suggestions why it is failing for me, is it the permissions on the file? If so why only I am hitting this issue?

Restore keyspace fails

I am getting following error during restore. Any idea what is causing this failure?

Traceback (most recent call last):
File "/usr/local/bin/cassandra-snapshotter", line 9, in
load_entry_point('cassandra-snapshotter==1.0.0', 'console_scripts', 'cassandra-snapshotter')()
File "/usr/local/lib/python2.7/site-packages/cassandra_snapshotter/main.py", line 284, in main
restore_backup(args)
File "/usr/local/lib/python2.7/site-packages/cassandra_snapshotter/main.py", line 110, in restore_backup
args.s3_bucket_name
TypeError: init() takes exactly 6 arguments (5 given)

My restore command is as follows:
cassandra-snapshotter --s3-bucket-name=mybucket --s3-bucket-region=us-west-2 --s3-base-path=/backups --aws-access-key-id=accesskey --aws-secret-access-key=secretkey restore --snapshot-name=LATEST --keyspace=mykeyspace --hosts=host1 --target-hosts=host1

Add support for ssh config

Does this script support use of SSH config files? It doesn't seem to work for me. I have a host, user, port, and IdentityFile defined in my ~/.ssh/config file, but the script still says:

Fatal error: Needed to prompt for a connection or sudo password (host: my_vm)

I have verified that no password is required for SSH nor sudo once logged in. Here's the config:

Host my_vm
Hostname 127.0.0.1
Port=1122
User=ubuntu
IdentityFile ~/.ssh/id_rsa

strange error when running backup

i have no idea what this means.. is this a verison issue?

[10.1.3.151] out: lzop 1.03
[10.1.3.151] out: LZO library 2.06
[10.1.3.151] out: Copyright (C) 1996-2010 Markus Franz Xaver Johannes Oberhumer
[10.1.3.151] out: Exception in thread Thread-2:
[10.1.3.151] out: Traceback (most recent call last):
[10.1.3.151] out: File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
[10.1.3.151] out: self.run()
[10.1.3.151] out: File "/usr/lib/python2.7/threading.py", line 504, in run
[10.1.3.151] out: self.__target(_self.__args, *_self.__kwargs)
[10.1.3.151] out: File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in _handle_tasks
[10.1.3.151] out: put(task)
[10.1.3.151] out: PicklingError: Can't pickle <type '_hashlib.HASH'>: attribute lookup _hashlib.HASH failed
[10.1.3.151] out:

project isn't published on pypi

$ pip install cassandra_snapshotter
Downloading/unpacking cassandra-snapshotter
  Could not find any downloads that satisfy the requirement cassandra-snapshotter
No distributions at all found for cassandra-snapshotter
Storing complete log in /Users/arikfr/.pip/pip.log

Project isn't on pypi?

Only get "manifest.json" and "ring" in my backups

Hi :)

Trying to get this to work on my EC2 instance...

When I run cassandra-snapshotter on my instance, I get the following output:

[x.x.x.x] Executing task 'node_start_backup'
[x.x.x.x] Executing task 'upload_node_backups'
[x.x.x.x] sudo: cassandra-snapshotter-agent  create-upload-manifest --manifest_path=/tmp/backupmanifest --snapshot_name=20140724110014 --snapshot_keyspaces= --snapshot_table= --data_path=/var/lib/cassandra/data/
[x.x.x.x] sudo: cassandra-snapshotter-agent  put --aws-access-key-id=<SNIP> --aws-secret-access-key=<SNIP> --s3-bucket-name=<SNIP> --s3-bucket-region=us-west-2 --s3-ssenc --s3-base-path=US_OR_Parser/20140724110014/x.x.x.x --manifest=/tmp/backupmanifest --concurrency=4
[x.x.x.x] out: lzop 1.03
[x.x.x.x] out: LZO library 2.06
[x.x.x.x] out: Copyright (C) 1996-2010 Markus Franz Xaver Johannes Oberhumer
[x.x.x.x] out:

[x.x.x.x] Executing task 'clear_node_snapshot'
[x.x.x.x] sudo: /usr/bin/nodetool clearsnapshot -t "20140724110014"
[x.x.x.x] out: Requested clearing snapshot for: all keyspaces
[x.x.x.x] out:

[x.x.x.x] sudo: /usr/bin/nodetool ring

And then it exits after a few seconds (code 0).

On S3, it creates folders, but the only two files it puts there are "manifest.json" and "ring". There are no backup files.

incremental_backups is enabled in cassandra.yaml, and JNA is also installed.

Adding --new-snapshot doesn't help.

Am I just being retarded here or is something wrong?

Thanks :)

Support IAM roles

It seems that AWS credentials are necessary to run the tool. It would be great to make the optional and the script will use the IAM role instead.

Unable to take backup. MalformedXML error

Hi,
I am making following call but receiving MalformedXML error. Any idea, what i am missing?

cassandra-snapshotter --aws-access-key-id **** --aws-secret-access-key **** --s3-bucket-region **** --s3-bucket-name **** --s3-base-path **** backup --hosts localhost --nodetool-path ~/cassandra/cassandra/bin/nodetool --cassandra-bin-dir ~/cassandra/cassandra/bin/ --user ubuntu --cassandra-conf-path ~/cassandra/cassandra/conf

Do i need to create any policy at s3 bucket side? Currently, i have not configured any policy with same. Any pointer here would be of great help. Thanks!

Output:
[localhost] out: lzop 1.03
[localhost] out: LZO library 2.06
[localhost] out: Copyright (C) 1996-2010 Markus Franz Xaver Johannes Oberhumer
[localhost] out: lzop: lzop.c:351: f_open: Assertion ft->name[0]' failed. [localhost] out: lzop: lzop.c:351: f_open: Assertionft->name[0]' failed.
[localhost] out: lzop: lzop.c:351: f_open: Assertion ft->name[0]' failed. [localhost] out: lzop: lzop.c:351: f_open: Assertionft->name[0]' failed.
[localhost] out: Traceback (most recent call last):
[localhost] out: File "/usr/local/bin/cassandra-snapshotter-agent", line 9, in
[localhost] out: load_entry_point('cassandra-snapshotter==0.4.0', 'console_scripts', 'cassandra-snapshotter-agent')()
[localhost] out: File "/usr/local/lib/python2.7/dist-packages/cassandra_snapshotter/agent.py", line 266, in main
[localhost] out: args.incremental_backups
[localhost] out: File "/usr/local/lib/python2.7/dist-packages/cassandra_snapshotter/agent.py", line 145, in put_from_manifest
[localhost] out: for _ in pool.imap(upload_file, ((bucket, f, destination_path(s3_base_path, f), s3_ssenc, buffer_size) for f in files)):
[localhost] out: File "/usr/lib/python2.7/multiprocessing/pool.py", line 659, in next
[localhost] out: raise value
[localhost] out: boto.exception.S3ResponseError: S3ResponseError: 400 Bad Request
[localhost] out: MalformedXMLThe XML you provided was not well-formed or did not validate against our published schema0056A9AE61068765PEIIsIHjnwy/331DR/DokeyFCWRIKqSxUcV/W31WXmNXmHYCiW5W4iZgF+bV5MvHZet6jx+ufIM=
[localhost] out: lzop: lzop.c:351: f_open: Assertion `ft->name[0]' failed.
[localhost] out:

pkg_resources.DistributionNotFound: ecdsa>=0.11

How: followed installation instructions (apt reported successful installation)
What: I get the following error when i type in cassandra-snapshotter --help
|
ubuntu@vsk01:$ cassandra-snapshotter --help
Traceback (most recent call last):
File "/usr/local/bin/cassandra-snapshotter", line 5, in
from pkg_resources import load_entry_point
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 2749, in
working_set = WorkingSet._build_master()
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 444, in _build_master
ws.require(requires)
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 725, in require
needed = self.resolve(parse_requirements(requirements))
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 628, in resolve
raise DistributionNotFound(req)
pkg_resources.DistributionNotFound: ecdsa>=0.11
|
Environment:
ubuntu@vsk01:
$ uname -a
Linux vsk01 3.13.0-57-generic #95-Ubuntu SMP Fri Jun 19 09:28:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
ubuntu@vsk01:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.2 LTS
Release: 14.04
Codename: trusty

Dependencies listed:

ubuntu@vsk01:~$ pip show cassandra-snapshotter

Name: cassandra-snapshotter
Version: 0.4.0
Location: /usr/local/lib/python2.7/dist-packages
Requires: argparse, fabric, boto

What is ecdsa and why is this dependency not met ? Thanks!

Docker version

Hey,
I would like to change the module to use docker-swarm discovery engine to execute the commands in order to produce the snapshots.
We are working in docker-swarm environment so the snapshotter will also be containerized.

Any quick advice on how / where to start ?

Update documentation for restore and other cool features

Hey

I will probably knock this out myself, but this awesome tool may be lacking some documentation ... #justsayin

For instance:

cassandra-snapshotter -v --aws-access-key-id=redacted \ --aws-secret-access-key=redacted \ --s3-bucket-name=your_buck_name --s3-bucket-region=us-west-1 --s3-ssenc \ --s3-base-path=your_base_path restore --keyspace=your_keyspace --target-hosts=cassandra01,cassandra02

Also add more details what happens during a restore, for instance the program is going to download the snapshot out of the s3 repo to the machine you ran the command on, and then run sstableloader. You need disk and sstableloader. Also what are the details on prepping the cluster? Do you need the schema to exists?

I am thinking to provide examples of various commands, also more details on getting snapshots running properly. I am guessing that you have to list the keyspaces to get snapshotting running correctly.

Here is an example of a list command:

cassandra-snapshotter -v --aws-access-key-id=redacted \ --aws-secret-access-key=redacted --s3-bucket-name=your_bucket_name \ --s3-bucket-region=us-west-1 --s3-ssenc --s3-base-path=your_base_path list

Also we may want to link to https://help.ubuntu.com/community/SSH/OpenSSH/Keys for instructions on creating ssh shared keys.

The topics areas that I see need some TLC are:

  • restore
  • ssh keys
  • list
  • other command line switches that are not documented

Thoughts? Comments?

Error: backups fail on exception

Hosts periodically throw attached error, causing backups to fail

boto-2.38
python-2.7.6
cassandra_snapshotter 0.4.0

[db4a.] out:   File "/usr/local/bin/cassandra-snapshotter-agent", line 9, in <module>
[db4a.] out:     load_entry_point('cassandra-snapshotter==0.4.0', 'console_scripts', 'cassandra-snapshotter-agent')()
[db4a.] out:   File "/usr/local/lib/python2.7/dist-packages/cassandra_snapshotter/agent.py", line 266, in main
[db4a.] out:     args.incremental_backups
[db4a.] out:   File "/usr/local/lib/python2.7/dist-packages/cassandra_snapshotter/agent.py", line 145, in put_from_manifest
[db4a.] out:     for _ in pool.imap(upload_file, ((bucket, f, destination_path(s3_base_path, f), s3_ssenc, buffer_size) for f in files)):
[db4a.] out:   File "/usr/lib/python2.7/multiprocessing/pool.py", line 659, in next
[db4a.] out:     raise value
[db4a.] out: boto.exception.S3ResponseError: S3ResponseError: 200 OK
[db4a.] out: <?xml version="1.0" encoding="UTF-8"?>
[db4a.] out: <Error><Code>InternalError</Code><Message>We encountered an internal error. Please try again.</Message><RequestId>RIDHERE</RequestId><HostId>IDHERE</HostId></Error>
[db4a.] out: 

Fatal error: run() received nonzero return code 1 while executing!

Requested: cassandra-snapshotter-agent put (REST OF COMMAND)

Aborting.

Error><Code>InternalError</Code><Message>We encountered an internal error

Hello, We are using cassandra_snapshot to take snapshots but getting the following error regularly

lzop 1.03
out: LZO library 2.06
out: Copyright (C) 1996-2010 Markus Franz Xaver Johannes Oberhumer
] out: Traceback (most recent call last):
out: File "/usr/local/bin/cassandra-snapshotter-agent", line 9, in
out: load_entry_point('cassandra-snapshotter==0.4.0', 'console_scripts', 'cassandra-snapshotter-agent')()
out: File "/usr/local/lib/python2.7/dist-packages/cassandra_snapshotter/agent.py", line 266, in main
out: args.incremental_backups
out: File "/usr/local/lib/python2.7/dist-packages/cassandra_snapshotter/agent.py", line 145, in put_from_manifest
out: for _ in pool.imap(upload_file, ((bucket, f, destination_path(s3_base_path, f), s3_ssenc, buffer_size) for f in files)):
out: File "/usr/lib/python2.7/multiprocessing/pool.py", line 659, in next
out: raise value
out: boto.exception.S3ResponseError: S3ResponseError: 200 OK
out:
out: InternalErrorWe encountered an internal error. Please try again.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Can someone please help why are we getting this running snapshots? We have 5 nodes each about 100 gbs of data.

Support EC2 IAM roles

Please support usage of IAM roles sample

This will allow EC2 machines to utilize their machine credentials for S3 access rather than requiring an access/secret key to be hardcoded.

Enhancement: run snapshot locally for localhost

Currently the backup command takes --hosts option to run snapshot commands on each of the cassandra nodes. It would be useful if specifying "localhost" alone would run the snapshot command locally rather than via SSH.

I could see that this might lead to end-user thinking they could have localhost as one node among many non-local nodes. So maybe a new command, like local_backup or something?

AccessDenied error with encryption

I've just setup the snapshotter, and when I run the command, it compresses and seemingly transfers everything to S3 (looks like it's all there), but then it errors at the end with the nodetool ring command as follows:

[hostname] run: /usr/bin/nodetool ring
Traceback (most recent call last):
  File "/usr/local/bin/cassandra-snapshotter", line 9, in <module>
    load_entry_point('cassandra-snapshotter==0.5.0', 'console_scripts', 'cassandra-snapshotter')()
  File "/usr/local/lib/python2.7/site-packages/cassandra_snapshotter/main.py", line 280, in main
    run_backup(args)
  File "/usr/local/lib/python2.7/site-packages/cassandra_snapshotter/main.py", line 76, in run_backup
    worker.snapshot(snapshot)
  File "/usr/local/lib/python2.7/site-packages/cassandra_snapshotter/snapshotting.py", line 351, in snapshot
    self.write_ring_description(snapshot)
  File "/usr/local/lib/python2.7/site-packages/cassandra_snapshotter/snapshotting.py", line 400, in write_ring_description
    self.write_on_S3(snapshot.s3_bucket, ring_path, content)
  File "/usr/local/lib/python2.7/site-packages/cassandra_snapshotter/snapshotting.py", line 394, in write_on_S3
    key.set_contents_from_string(content)
  File "/usr/lib/python2.7/dist-packages/boto/s3/key.py", line 1426, in set_contents_from_string
    encrypt_key=encrypt_key)
  File "/usr/lib/python2.7/dist-packages/boto/s3/key.py", line 1293, in set_contents_from_file
    chunked_transfer=chunked_transfer, size=size)
  File "/usr/lib/python2.7/dist-packages/boto/s3/key.py", line 750, in send_file
    chunked_transfer=chunked_transfer, size=size)
  File "/usr/lib/python2.7/dist-packages/boto/s3/key.py", line 951, in _send_file_internal
    query_args=query_args
  File "/usr/lib/python2.7/dist-packages/boto/s3/connection.py", line 665, in make_request
    retry_handler=retry_handler
  File "/usr/lib/python2.7/dist-packages/boto/connection.py", line 1071, in make_request
    retry_handler=retry_handler)
  File "/usr/lib/python2.7/dist-packages/boto/connection.py", line 940, in _mexe
    request.body, request.headers)
  File "/usr/lib/python2.7/dist-packages/boto/s3/key.py", line 884, in sender
    response.status, response.reason, body)
boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>6C23CB43B06B5909</RequestId><HostId>z6yqnOVBCfLjnF0xTB1jyZaKpsVd4Q+Vgizvl9iJCxbgH573MFXuu9UCfCKlv1nvn2Fp/Ronxlo=</HostId></Error>

I've just discovered that this is happening because of encryption. Without the --s3-ssenc flag and with the bucket encryption policy removed, everything completes. Any known reason why the nodetool ring fails when encryption is enabled?

S3 bucket policy

{
    "Version": "2012-10-17",
    "Id": "PutObjPolicy",
    "Statement": [
        {
            "Sid": "DenyUnEncryptedObjectUploads",
            "Effect": "Deny",
            "Principal": "*",
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::s3-bucket-name/*",
            "Condition": {
                "StringNotEquals": {
                    "s3:x-amz-server-side-encryption": "AES256"
                }
            }
        }
    ]
}

Incremental removes files for upload even if the upload fails

I've been reading the code before trying this out, and I noticed that when incremental_backups is True, put_from_manifest will delete files even if they failed to upload:

...
for ret in pool.imap(upload_file,
((bucket, f, destination_path(s3_base_path, f), s3_ssenc, buffer_size) for f in files)):
if not ret:
exit_code = 1
break
pool.terminate()
if incremental_backups:
for f in files:
os.remove(f) # DELETING FILES HERE EVEN IF exit_code == 1
exit(exit_code)

This doesn't seem wise. The easy solution is to make deletion conditional on the value of exit_code. It would be neater to delete those files that uploaded correctly and leave the files that didn't alone, but maybe it's overkill.

Tag the 0.4.0 release

It'd be cool if the hash that corresponds to what went out as 0.4.0 was tagged; and this tag pushed to the repo.

Somewhat akin to #49

Thanks

Restore capability

I know that you used sstableloader for restore, or do we want to do that or just scp the file and have the agent uncompress it?

Or the agent download the files and restore itself? A reverse of the backup

pip install cassandra_snapshotter does not work

Getting the fol;lowing error when trying to run pip install; running Fedora 23, similar issue on AMZN Linux and ubuntu.

$ pip install cassandra_snapshooter
Collecting cassandra-snapshooter
Could not find a version that satisfies the requirement cassandra-snapshooter (from versions: )
No matching distribution found for cassandra-snapshooter

flush before incremental backup upload?

In the current implementation you run nodetool flush before incremental backups upload. We were thinking of letting Cassandra flush whenever it feels it needs to, and upload the files it generates between runs.

Is calling flush explicitly is something you added after letting Cassandra flush when it "feels to", or you haven't tried it differently?

Script for restoring the snapshots from backup stored on s3?

Script to restore from s3 that should be cover:

  1. restore data on the same node
  2. restore data on complete cluster
  3. restore data on new node
  4. restore data on new cluster

TTL โ€“ how does it impact the SSTables?
-Need to test to make sure the expired tables do not come back.

Error : OSError: [Errno 13] Permission denied: while taking incremental backup :

We're facing an issue when trying to take the incremental backup using cassandra_snapshotter tool.

We've a three node cluster in Amazon cloud. We've enabled incremental backup and moving it to S3. But when we try to execute the snapshotter tool it gives permission denied error :

ec2-154-421-401-82.compute-1.amazonaws.com] out: Traceback (most recent call last):
[ec2-154-421-401-82.compute-1.amazonaws.com] out: File "/usr/local/bin/cassandra-snapshotter-agent", line 9, in
[ec2-154-421-401-82.compute-1.amazonaws.com] out: load_entry_point('cassandra-snapshotter==0.4.0', 'console_scripts', 'cassandra-snapshotter-agent')()
[ec2-154-421-401-82.compute-1.amazonaws.com] out: File "/usr/local/lib/python2.7/dist-packages/cassandra_snapshotter/agent.py", line 266, in main
[ec2-154-421-401-82.compute-1.amazonaws.com] out: args.incremental_backups
[ec2-154-421-401-82.compute-1.amazonaws.com] out: File "/usr/local/lib/python2.7/dist-packages/cassandra_snapshotter/agent.py", line 151, in put_from_manifest
[ec2-154-421-401-82.compute-1.amazonaws.com] out: os.remove(f)

[ec2-154-421-401-82.compute-1.amazonaws.com] out: OSError: [Errno 13] Permission denied: '/cassandra_data/data/test_bkp/t-9bc68ac040d511e5bab8a38736f22d26/backups/test_bkp-t-ka-2-Statistics.db'

It look like some sort of permission error but we've give full permission to (t-9bc68ac040d511e5bab8a38736f22d26/) directory but no luck. also ran the tool from root account but still we see the same error.
We also cleared all the snapshots in S3 and tried to execute the snapshotter again and it worked fine but after inserting some data and running the tool results in above error message.

Incremental backups fail when manifest.json not in root of bucket prefix

Given an S3 bucket with paths like cassandra-backups/staging/20141211231905... this line returns an array like [u'staging/', u'staging/20141211231905/']. The get_contents_as_string() call then tries to look up a path like cassandra-backups/staging/manifest.json which of course fails with a 404.

Here's a backtrace:

Traceback (most recent call last):
  File "/usr/bin/cassandra-snapshotter", line 9, in <module>
    load_entry_point('cassandra-snapshotter==0.3.0', 'console_scripts', 'cassandra-snapshotter')()
  File "/usr/lib/python2.6/site-packages/cassandra_snapshotter-0.3.0-py2.6.egg/cassandra_snapshotter/main.py", line 198, in main
    run_backup(args)
  File "/usr/lib/python2.6/site-packages/cassandra_snapshotter-0.3.0-py2.6.egg/cassandra_snapshotter/main.py", line 38, in run_backup
    table=args.table
  File "/usr/lib/python2.6/site-packages/cassandra_snapshotter-0.3.0-py2.6.egg/cassandra_snapshotter/snapshotting.py", line 435, in get_snapshot_for
    for snapshot in self:
  File "/usr/lib/python2.6/site-packages/cassandra_snapshotter-0.3.0-py2.6.egg/cassandra_snapshotter/snapshotting.py", line 445, in __iter__
    self._read_s3()
  File "/usr/lib/python2.6/site-packages/cassandra_snapshotter-0.3.0-py2.6.egg/cassandra_snapshotter/snapshotting.py", line 418, in _read_s3
    manifest_data = mkey.get_contents_as_string()
  File "/usr/lib/python2.6/site-packages/boto-2.34.0-py2.6.egg/boto/s3/key.py", line 1780, in get_contents_as_string
    response_headers=response_headers)
  File "/usr/lib/python2.6/site-packages/boto-2.34.0-py2.6.egg/boto/s3/key.py", line 1648, in get_contents_to_file
    response_headers=response_headers)
  File "/usr/lib/python2.6/site-packages/boto-2.34.0-py2.6.egg/boto/s3/key.py", line 1480, in get_file
    query_args=None)
  File "/usr/lib/python2.6/site-packages/boto-2.34.0-py2.6.egg/boto/s3/key.py", line 1512, in _get_file_internal
    override_num_retries=override_num_retries)
  File "/usr/lib/python2.6/site-packages/boto-2.34.0-py2.6.egg/boto/s3/key.py", line 343, in open
    override_num_retries=override_num_retries)
  File "/usr/lib/python2.6/site-packages/boto-2.34.0-py2.6.egg/boto/s3/key.py", line 303, in open_read
    self.resp.reason, body)
boto.exception.S3ResponseError: S3ResponseError: 404 Not Found
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>cassandra-backups/manifest.json</Key><RequestId>A777F43BA4BCC13F</RequestId><HostId>/qU27mGG5n7cPd4KP8UfcaiwTjGNzCP58HakJqYVhIn6KFWBoS6ZBps4pSj7btI/</HostId></Error>

Error when taking backups

[172.31.23.194] sudo: python -c "import os; print os.path.join(['/var/lib/cassandra/data/', '', '', 'snapshots', '20140418145633', ''])"
[172.31.23.194] sudo: python -c "import glob; print '\n'.join(glob.glob('/var/lib/cassandra/data///snapshots/20140418145633/*'))"
[172.31.23.194] put: /tmp/tmpghxPeI -> /tmp/tmpghxPeI

Fatal error: put() encountered an exception while uploading '/tmp/tmpghxPeI'

Underlying exception:
Permission denied

Aborting.

Fatal error: One or more hosts failed while executing task 'upload_node_backups'

Fatal error: Needed to prompt for a connection or sudo password

I got an error when executing backup command:

cassandra-snapshotter --s3-bucket-name=XXXX \
                      --s3-bucket-region=us-east-1 \
                      --s3-base-path=XXXXX \
                      --aws-access-key-id=XXXXXXXXXXXX \
                      --aws-secret-access-key=XXXXXXXXXXXXXXXXXXX \
                      --s3-ssenc \
                      backup \
                      --hosts=lab_db \
                      --user=cassandra

[lab_db] Executing task 'node_start_backup'

Fatal error: Needed to prompt for a connection or sudo password (host: lab_db), but input would be ambiguous in parallel mode

Aborting.

Fatal error: One or more hosts failed while executing task 'node_start_backup'

Aborting.
[lab_db] Executing task 'clear_node_snapshot'
[lab_db] run: /usr/bin/nodetool clearsnapshot -t "20151211141001"

Fatal error: Needed to prompt for a connection or sudo password (host: lab_db), but input would be ambiguous in parallel mode

Aborting.

Fatal error: One or more hosts failed while executing task 'clear_node_snapshot'

Aborting.

conf path required by agent

I get the following error when using cassandra-snapshottr
out: cassandra-snapshotter-agent create-upload-manifest: error: argument --conf_path is required

My command is
cassandra-snapshotter --aws-access-key-id=SOMETHING --aws-secret-access-key=SOMEKEY --s3-bucket-name=mybucket --s3-bucket-region=us-east-1 --s3-base-path=dirname backup --hosts=54.x.x.x --user=username

This command worked fine earlier but fails when I used pip install to get the latest on the node.

Please help me to debug this further and fix it

Handle S3 buckets with dots in their names

Currently this happens:

ssl.CertificateError: hostname 'my.bucket.name.s3.amazonaws.com' doesn't match either of '*.s3.amazonaws.com', 's3.amazonaws.com'

This can usually be worked around by putting the bucket name at the beginning of the path.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.