gilbertchen / duplicacy Goto Github PK

View Code? Open in Web Editor NEW

5.1K 5.1K 333.0 12.67 MB

A new generation cloud backup tool

Home Page: https://duplicacy.com

License: Other

Go 99.29% Shell 0.71%

duplicacy's People

Contributors

$fracai avatar$

Stargazers

Watchers

Forkers

keithwbacon neuroradiology warwickgrigg jtwill cloudxtreme bhyvex hughker anthony-o danthelion mwpeterson josefmor chrisgavin-archive georgyo streikt ech1965 drkarl pol vaibkamble hmbrg practise2017 mikewlange pilgrim2go prateekpandey14 nvk1990 geezer-workshop foucl cvega zihua devopsbox gophersgang adolfoeliazat godeep lyrl leobcn lucker6666 golang-kit sdaros hhy5277 chbmuc leucos vaginessa cpanelmatthew evilmcjerkface phretor pascalandy whereisaaron cryptobuks clbn eliog ranjitc thenickdude madmack mjmaisey smt countextreme flamingm0e yuxhuang niknah flilja robbat2 jkl1337 neilmusgrove soamaven tophee anugupta fracai ereichert lowne macdanny thebestpessimist nikita-b vb0 adevress pawitp nburles mwm5945 andrej-zirko utarn metaver5o michaelcinquin samcorbin lillecarl jer-sen mumblepins perpetual-hydrofoil pdf trevi-software kulhos jenrik kairisku georgebarnett igor-krawczuk blefaudeux virtualcodewarrior fhriley xiaozhenliang leehn beaquant hqdmyjsw slimbloody

duplicacy's Issues

Heavy Class C API use for B2

2500 calls allowed per day (for free) otherwise "B2 cap exceeded" (or raise your cap)

Transactions Class C
Cost: The first 2,500 of these calls are free each day, then
b2_authorize_account
$0.004 per 1,000
b2_create_bucket
$0.004 per 1,000
b2_list_buckets
$0.004 per 1,000
b2_list_file_names
$0.004 per 1,000
b2_list_file_versions
$0.004 per 1,000
b2_update_bucket
$0.004 per 1,000
b2_list_parts
$0.004 per 1,000
b2_list_unfinished_large_files
$0.004 per 1,000
b2_get_download_authorization
$0.004 per 1,000

OpenStack Swift support

HI it will be very interesting that you add support for openstack swift
some cloud providers use as a backend openstack swift so you will have implemented all of them only implementing one API.

Add "size of cache directory" to comparison table

duplicity is unusable if you have a lot of data, it uses too much space to store copies of indexes, whereas obnam does not. Maybe you could write about how duplicacy handles this.

Suggestion: add support for cold storage services

So called "cold storage" offers are really cheap these days. For example:
Online C14 — 0.002€ / GB / Month
OVH Cloud Archive (in french, sorry)— 0.002€ / GB / Month

These services imply that have to deal with delayed reads and writes.

Would it be possible for Duplicacy to adopt these services?

Request: Improve restore speed by downloading multiple chunks in parallel

Currently when restoring data, chunks are downloaded one after the other. When using a backend with slow response times, this can lead to a lot of downtime between active transfers. Chunks themselves will download quickly, but there will then be a pause before the next one commences. For instance Google Drive tends to have a 1-3s delay when requesting downloads - this delay isn't unique to Duplicacy as other software that uses the Google Drive API also encounters this delay with downloads.

For instance, over the course of a restore, speeds tend to average about 2MB/s (16Mbit/s) on my 100Mbit connection - utilizing about 1/6th of total available bandwidth.

While quite involved, for larger restores it is currently possible to manually run several instances of Duplicacy when restoring - each working on a different include/exclude filter set and restoring to separate folders - and to manually merge the data after restoring is finished. I've been able to significantly increase download speed by doing this.

Ideally though Duplicacy would handle this natively. Would it be possible to, in the future, give Duplicacy the ability to download more than once chunk simultaneously when restoring in order to improve download efficiency? Perhaps something as simple(TM) as beginning the download of the next x number of chunks while working on the currently active one. A --threads parameter for restore would be ideal.

Doing this would have a significant impact on the time taken to complete a restore.

Add comparison to rclone to duplicacy chart

The comparison chart among the different backup/sync solutions is awesome... but it could be enhanced by adding a comparison to rclone: https://github.com/ncw/rclone

Great work on the product, guys, I'm looking forward to seeing more development and features coming out the gate. :)

Specify SFTP port?

Currently it seems SFTP port is hardcoded to 22. Not all SFTP servers are listening on port 22.

Error I get:
$ duplicacy_win_x64_0.1.10.exe init repo sftp://my.domain.org:5525/path/to/repository
Failed to load the SFTP storage at sftp://my.domain.org:7000/path/to/repository:
dial tcp: too many colons in address my.domain.org:7000:22

enumerate chunks for a given pattern

Hello,

I'm evaluating duplicacy and am narrowing on a particular mode of operation:

for a variety of reasons (performance, reliability, unsupported backend), I am abstracting away the relay of chunks to cloud storage (rclone 32 parallel uploads, etc)
in this mode of operation, I cache chunks locally, upload them to a cloud storage, then truncate locally
this preserves the file name and location to enable dedupe, but does not occupy local storage

In this mode of operation, I can quickly download chunks via parallelism. When I have the chunks (and snapshot files) downloaded and available locally, I can recover quickly. This does not work, however, if I do not have sufficient local disk space to bring the entire collection of chunks down.

If duplicacy could enumerate chunks necessary for a specific restore pattern (say, all my cat pictures only), I could prime the restore by first obtaining only those chunks necessary for the restore prior to running the restore job. At present, I am only able to enumerate chunks for a given snapshot. It is feasible to try to cut my dataset into smaller logical repos, each independently backed up/snapshotted, such that a full restore could be performed repo by repo, but that is laborious.

So the feature request is to add chunk enumeration for a given restore pattern.

history not working on remote sftp storage

i want to use duplicacy to do backups on an external server via sftp.
i can setup the repository and do backups. i can restore files from sftp.
sofar it seems to work. BUT history is not working

this is with a local repository

/opt/duplicacy list -a
Storage set to /data/backup/data/test/backup/
Snapshot seafile revision 1 created at 2016-11-03 17:04 -hash
Snapshot seafile revision 2 created at 2016-11-03 17:04

/opt/duplicacy history eins.txt
Storage set to /data/backup/data/test/backup/
      1:               6 2016-11-03 17:04:01 622cb3371c1a08096eaac564fb59acccda1fcdbe13a9dd10b486e6463c8c2525 eins.txt
      2:              11 2016-11-03 17:04:40 464e4dd16ebd5affe3a43ca8a4a9f623e96268cf8aba762fe5cec827902a268d eins.txt*
current:              11 2016-11-03 17:04:40                                                                  eins.txt

now the same with a remote repository via sftp

/opt/duplicacy list -a
Storage set to sftp://[email protected]/data
Enter SSH password:********************
Snapshot seafile revision 1 created at 2016-11-04 10:13 -hash
Snapshot seafile revision 2 created at 2016-11-04 10:14
Snapshot seafile revision 3 created at 2016-11-04 10:14

/opt/duplicacy history eins.txt
Storage set to sftp://[email protected]/data
Enter SSH password:********************
No file eins.txt found in snapshot seafile at revision 1

/opt/duplicacy history -r 3 eins.txt
Storage set to sftp://[email protected]/data
Enter SSH password:********************
      3:              11 2016-11-04 10:14:31 f950375066d74787f31cbd8f9f91c71819357cad243fb9d4a0d9ef4fa76709e0 eins.txt
current:              16 2016-11-04 10:17:50                                                                  eins.txt*

so history is not working, i can retrieve info to a special revision but not all history info. the behaviour between local and sftp storage and "history" is different.

i can restore the file to a special revsion. this is working.

markus

Empty folders possibly not being recorded in snapshots

I've noticed that empty folders don't show in a snapshot's files list. Is this intended behavior, or a possible bug?

Steps to reproduce:

Include an empty folder in a repository and run a backup
List the files in the most recent revision, either from the command line or GUI.
Note the absence of the empty folder

Using 1.1.4 CLI on Windows 10 x64

Different repositories at common root with different filters

Just exploring the system and would like guidance on preferred way to handle having different backup sets (repositories?) with a common root and different filters?

For example: If I want to have two backup sets for personal media and general media I imagine I should initialise two repositories. However, as they would both have the same root (say C:\Users\martin\media) how do I then specify different include paths for each as I understand from the documentation that there can only be one .duplicacy/filters file.

So I'd like to achieve the following:

general-media
+C:\Users\martin\media\film
+C:\Users\martin\media\music

personal-media
+C:\Users\martin\media\photo
+C:\Users\martin\media\video

Restore from storage without knowing snapshot id

It does not appear possible to create a new repository from scratch and link it to a previously created repository residing on remote storage without knowing the snapshot id used by the the former repository.

For example, if the local repository is completely lost and you want to recover by restoring from the remote storage, you might want to inspect the remote storage to see which snapshot IDs are available. There seems to be no way to find what exists on the remote storage independent of a properly initialized local repository. However, if you know the snapshot id you want, you can init a new local repository with the proper snapshot id and find the snapshot to begin restore of the desired revision. It would be nice to not have to remember the correct snapshot ID to be able to recover from a catastrophic event.

Can not list all snapshot when working dir is not a repository

root@zmbox:~/duplicacy list -a -storage /data/duplicacy-repo/
Repository has not been initialized

When I use -a to show all snapshot,
This should be work but seems not working.

FUSE Read Only Snapshot

It could be amazing if we can mount as a read only one snapshot as do btrfs but in read only.

or better even if we can mount all the snapshots and see the backup as a file versioning FS.

Prune command on Backblaze B2

In release notes of version 1.1.0, it is indicated that :

Backblaze B2 storage now supports cross-computer deduplication (-exclusive is no longer required)

However, when I run the prune command, I get the following error:

The --exclusive option must be enabled for storage b2://XXX

No compression level in 2.0

GUIDE.MD

While Duplicacy 2.0.0 gives me that

NAME:
   duplicacy init - Initialize the storage if necessary and the current directory as the repository

USAGE:
   duplicacy init [command options] <snapshot id> <storage url>

OPTIONS:
   -encrypt, -e 		encrypt the storage with a password
   -chunk-size, -c 4M 		the average size of chunks
   -max-chunk-size, -max 16M 	the maximum size of chunks (defaults to chunk-size * 4)
   -min-chunk-size, -min 1M 	the minimum size of chunks (defaults to chunk-size / 4)

No compression for duplicacy?

Multiple 'chunks' folders present in Google Drive remote storage

When I first initialized the remote storage, there were 3 folders present (chunks, fossils and snapshots). There is now an extra 256 empty 'chunks' folders, named 'chunks/00', 'chunks/01', ..... 'chunks/fe', 'chunks/ff'.

Google Drive screenshot: https://imgur.com/11W3DWI

Is this intended behavior, or has something gone wrong somewhere? I believe they appeared after adding a second repository - the CLI was reporting 'Listing Chunks' or something similar at the time they appeared - I could be wrong though. I should be able to reproduce and be more specific if this behavior isn't intended.

Using 1.1.3 CLI version on Windows 10 x64.

edit I can reproduce this simply by running a 'check'. The folders appear when the CLI reports 'Listing all chunks'.

Can you share a script to automate backups/checks/prunes?

I am currently using borg and they provide a sample script which makes life a lot easier than having to cobble this together on your own: https://borgbackup.readthedocs.org/en/stable/quickstart.html#automating-backups

I'm looking to backup multiple LXCs to say S3 so I need to have a script that backs up, checks and prunes according to my needs and a way to safely store the access data for S3.

Amazon Cloud Drive support?

Please :-)

S3 on Linux can't login

duplicacy init postgres_s3 s3://[email protected]/postgres-backup

Enter S3 Access Key ID:KJCBWECJBNKEWJCN
Enter S3 Secret Access Key:lekrfnlLKNLNIHUO8KJNDCS89HNKJNLNlkn
Failed to configure the storage: The request signature we calculated does not match the signature you provided. Check your key and signing method.

I'm 100% sure credentials are good.
I use them in s3cmd

Same on Mac btw.

Request: Make use of deduplication during restore

For a test scenario, I backed up a folder containing 5 copies of the same 200mb file. When backing up the folder the deduplication worked as expected - storing all 1GB worth, though only needing to transfer 200MB - however when it came to running a test restore, each file was restored separately, presumably redownloading the exact same set of chunks in order to do so.

Would it be at all possible to rework the way in which files are restored in order to optimize the amount of data transferred? I imagine this would come at the cost of more processor overhead, having to work out which files share which chunks prior to, or during, the restore itself. Perhaps chunk downloads could be cached and only discarded once they are known to be no longer needed. For the sake of efficiency, any processing would be done in parallel to the chunk downloads.

Request: mega.nz storage

I hope this is not the wrong place for this, but I would really love to see support for uploading to mega.nz. I know that it may be a bit hard because all the encryption needs to be done client side, but it is what I currently use and it would be awesome to directly upload from you app and not to use a work around.
Thanks for reading. :)

Access denied listing subdirectory on Windows 10

Running a backup on Windows 10, duplicacy 1.1.6 is giving the following error many times for different paths:

Failed to list subdirectory: 
open C:\Users\martin/.duplicacy\shadow\\Users\martin/data/identities/personal/project/mithril-boilerplate/node_modules/watchify/node_modules/browserify/node_modules/insert-module-globals/node_modules/lexical-scope/node_modules/astw/node_modules/esprima-fb/examples: 
Access is denied.

Is this a long path issue?

Note: In an Admin shell, backup executed using C:\Users\martin>duplicacy backup -stats -vss

Backup empty folders?

Hi.

I've come acros a problem.
While backing up with duplicacy to s3, PostgreSQL PG-DATA folder have sonme empty folders in it,
pg_notify for example.

I've noticed that duplicacy don't do anything with empty folders.
It just won't backup them.
In such scenario i'm unable to restore Postgres to working state.
UTC [18091] FATAL: could not open directory "pg_notify": No such file or directory

Is there any trigger to enable empty folder backup, or it is a bug?

Proof of performance?

Admittedly I'm in over my head here because I'm not a Computer Scientist, but I wonder if there is anything like a formal proof that your technology and algorithms are completely data safe? (I'm not talking about implementation bugs, but the actual formal design.)

Include/exclude file patterns

It seems that the Exclude/include file patterns documentation needs some updates.

The include mark is '+', while the exclude one is '-'. This is reversed in the documentation.
Documentation should mention that there is no space after the mark.
Adding an example of a filter file would be useful.

I found that when you want to include a file, all the directories between the repository and the file need to be added. For example, if the repository is at / and I want to include the file /var/lib/app/myfile, I need to use the following includes:

+var/
+var/lib/
+var/lib/app/
+var/lib/app/myfile

I was expecting that adding +var/lib/app/myfile would be enough.

Finally, it would be nice to be able to include files that are outside the repository. In my case, I need to backup multiple files/directories, for which the common folder is the root one (/). I would like to put the repository elsewhere.

Can not copy from local disc to sftp.

I tried to create a second storage options with duplicacy add -copy default server oma sftp://<some ip here>/backup

After that I tired to copy the backup from the local disc to the sftp server with this command:
duplicity copy -to server

and get this response:

Copying snapshots to sftp://<some ip here>/backup was disabled by the preference

Did I do something wrong or is this a bug?

Choose file at restore

Hi gilbertchen,

First of all, I've been testing this application with B2 storage and it works fine, but I currently have the option no_save_password to false but it didn't save the account-id/key, It request b2 credentials per instruction that I execute.

In addition, I have a suggestion that I think that could be useful for all the users, restore a single file from a given snapshot on the desired destination.

Backup hanging with sftp

Am seeing the backup process hang without any information regarding any issues. This is over sftp.

For example, running a backup to a remote machine today, I checked output at 18:27 to see the process had appeared to hang at 17:27

2017-01-01 17:27:01.194 INFO UPLOAD_PROGRESS Skipped chunk 378 size 6580432, 17.11MB/s 04:17:29 0.6%
2017-01-01 17:27:01.491 INFO UPLOAD_PROGRESS Skipped chunk 379 size 11666325, 17.22MB/s 04:15:56 0.6%
2017-01-01 17:27:02.409 INFO UPLOAD_PROGRESS Skipped chunk 381 size 1115549, 17.07MB/s 04:18:09 0.6%
2017-01-01 17:27:03.311 INFO UPLOAD_PROGRESS Skipped chunk 382 size 5007440, 16.96MB/s 04:19:51 0.7%

Note that the remote host is still accessible and I can ssh in fine from the same machine running the backup.

Q: support for road warriors?

The best way to illustrate this question is with a scenario:

baseline: My wife and I have identical filesets on portable computers.
She modifies a local copy of a file.
I modify a local copy of the same file in a different manner.

At this point, both of our local copies are valid but we don't know about the modifications made by the other party.

I sync the master fileset with duplicacy.

What happens when she syncs?

WebDAV Support

I'm not sure WebDAV support is available or not.

I'm wondering if this is possible to add, or if there is another way/workaround to allow me to backup to an drive attached with the WebDAV protocol.

licensing? unsigned binaries? end-user's perspective

If usability claims made in README.md are true duplicacy has a potential to become a valued part of sysadmin's toolbox. As of this writing there are some wrinkles though, which could and should be explicitly addressed:

Licensing model?

What are the intentions here?

Opening the sources / code review key for broad(er) adoption.

Case 1 - Duplicacy is the best thing since sliced bread. Still, bugs happen. Test suite could always be expanded for weird corner cases..
Case 2 - Ability to restore data from one's backups after a day, month or 10 years is rather essential. Duplicacy backup sitting in S3 or Glacier is now the last and only remaining source of precious data after a few years of seamless backups and restores. You have run into an issue that didn't manifest itself before and are now out of ideas. Original sole author/organization supporting it is now MIA and you can't troubleshoot on your own. Was using cloud backend with duplicacy this the best use of your money and time?
Case 3 - Not insinuating anythint, just playing Devil's Advocate: What lurks in the unsigned binaries? How do I know the binary is published by someone with an established web of trust and not by nefarious entity patiently waiting to trigger a ransomware attack at some later date?

Backups need to be boring.

Regardless if you plan to back up megabytes or petabytes. Playing with unsigned binaries on dummy data and getting excited by cool features is one thing, using them on valuable data and hoping for the best is another (anything worth backing up is valuable, right?). This becomes especially true when one has also been footing the bill for remote cloud storage and associated I/O requests for the lifetime of backed up data.

S3 eventual consistency

Just wondering if S3's eventual consistency causes a problem. My understanding is that S3's rename is a copy followed by a delete, and that the delete part of that process is an "eventually" consistent operation. If a backup starts very soon after fossil collection ends, and if at the very end of the fossil collection a chunk rename (ie delete) operation starts, eventually becoming consistent at some later time, then the backup could think the chunk properly exists when actually S3 is in the process of converting it into fossil. The second stage prune thinks the backup wasn't concurrent so it would happily delete the fossil rather than converting it back to a proper chunk. Is that a possible scenario?

Connecting to SFTP

First off, let me preface this by saying Filezilla works fine.

Since I have static IP addresses at both locations, I forwarded port 22 from the WAN to the NAS that hosts OpenSSH for SFTP, but only the public IP of the location where Duplicacy is being used is allowed to connect on port 22.

No matter what I do, Duplicacy will not connect. Duplicacy from the command line gives me:
ssh: handshake failed: EOF

Windows GUI gives me:

Duplicacy Error

ERROR Failed to load the SFTP storage at sftp://[email protected]/VMBACKUPS: Can't access the storage path VMBACKUPS: file does not exist

The snapshot ~ at revision 2 contains an error: The entry ~ appears before the entry ~

Hey,

In testing a few different scenarios, I've come across an interesting problem: I am able to complete a backup/snapshot, but when running the subsequent one, it fails with the following message:

The snapshot data at revision 2 contains an error: The entry /Sources/� Sanity check_ Is WiMAX almost here and will it unlock the n...pdf appears before the entry /Sources/» Sanity check_ Is WiMAX almost here and will it unlock the n...pdf

In looking at this, it appears the files are duplicates and contain strange filenames (windows sourced). One solution is removing the duplicate, or correcting the filenames. However, knowing which files are problematic ahead of time is difficult and having to re-run the snapshot/backup job is time consuming. Do you have any suggestions? Have you seen this before?

Is it possible to edit the snapshot file (encrypted) to correct it or is my only option to try to fix the filename/duplicate problem, delete the snapshot, and try again?

Thanks!

Upload failing with Dropbox backend

Last backup failed with:

2017-01-07 11:04:58.163 ERROR UPLOAD_CHUNK Failed to upload the chunk 5148711fa98f182c9fadb05d881b27a3be1fdd396259180f736a96e48c5e2c31:
goroutine 37 [running]:
runtime/debug.Stack(0x41c568, 0xc420495ad0, 0xc420495ab0)
/usr/local/go/src/runtime/debug/stack.go:24 +0x79
runtime/debug.PrintStack()
/usr/local/go/src/runtime/debug/stack.go:16 +0x22
github.com/gilbertchen/duplicacy.CatchLogException()
/Users/chgang/zincbox/go/src/github.com/gilbertchen/duplicacy/duplicacy_log.go:161 +0x196
panic(0x8dc160, 0xc4305e2480)
/usr/local/go/src/runtime/panic.go:458 +0x243
github.com/gilbertchen/duplicacy.logf(0x2, 0x9342f1, 0xc, 0x940223, 0x21, 0xc420495f30, 0x2, 0x2)
/Users/chgang/zincbox/go/src/github.com/gilbertchen/duplicacy/duplicacy_log.go:147 +0x1d8
github.com/gilbertchen/duplicacy.LOG_ERROR(0x9342f1, 0xc, 0x940223, 0x21, 0xc420495f30, 0x2, 0x2)
/Users/chgang/zincbox/go/src/github.com/gilbertchen/duplicacy/duplicacy_log.go:93 +0x73
github.com/gilbertchen/duplicacy.(*BackupManager).UploadChunk.func1(0xc430138010, 0xc4203c6060, 0xc422e87d00, 0xc420138000, 0xc42e3d0051, 0x48, 0xc422e87cc0, 0x40)
/Users/chgang/zincbox/go/src/github.com/gilbertchen/duplicacy/duplicacy_backupmanager.go:1745 +0x1fc
created by github.com/gilbertchen/duplicacy.(*BackupManager).UploadChunk
/Users/chgang/zincbox/go/src/github.com/gilbertchen/duplicacy/duplicacy_backupmanager.go:1752 +0x6fa

Question: Shadow volume and concurrent runs

On Windows 10 with -vss flag: If I run duplicacy concurrently to two different storages for the same repo I notice that when the one process exits faster (as writing to local storage) it logs that it deletes the shadow volume in use.

Is this an issue? Would it cause a problem for the other process running or are the shadow volumes distinct per run?

Add duplicati to your comparison matrix.

I would be very interested to see how it stacks up https://github.com/duplicati/duplicati/releases

Filters

I have recently had some trouble setting up a filter.
Basically i was trying to backup some specific folders, but exclude some subfolders within those

-foo/bar/
+foo/
-*

That however didnt back up anything at all within foo.
Eventually i found a working solution using

-foo/bar/
+foo/*
-*

However based on the documentation it seems to me my first example should have worked as well?
Same behaviour on Windows and Linux version.

Suggestion: Check portion of files

Right now the check command either only verifies the existence of chunks, or verifies hashes of all the chunks at once.
It might be useful to have an intermediate option which allows for verifying a custom amount of data (duplicati has similar).
B2 for example has 1G of free download each day, so you could schedule a free daily fractional verification.
Easiest would be just select samples at random, or best keep track of when which chunk was verified and cycle through.

Command to purge / totally remove backup folder etc

Hi gilbertchen,

Is there any cli command to totally remove all the chunks, folders, etc from a particular storage url? I tried prune - it deleted the files but the backup folder and sub folders are still there.

Can it backup VMs? aka .raw and .qcow2 files while the respective machines are running?

Of course if said machine was running something like say a MYSQL DB this wouldn't be an ideal solution but would you mind giving us some more insights into the feasibility of this?

Strange output for a null size file alone in a directory.

Hello,

Don't know if it's a bug, but the output message is not very nice for a null size file... (Debug ?!)

Every things looks good when a null size file plus another not null... (Even if I don't really like the output of the null size file...)

[root@8ad75214293c home]# uname -a
Linux 8ad75214293c 4.9.13-moby #1 SMP Sat Mar 25 02:48:44 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
[root@8ad75214293c home]# cat /etc/redhat-release
CentOS Linux release 7.3.1611 (Core)
[root@8ad75214293c home]#

[root@8ad75214293c home]# rm -r .duplicacy ; rm -r /mnt/data/backup ; mkdir /mnt/data/backup
[root@8ad75214293c home]# duplicacy_linux init 0 /mnt/data/backup
/home will be backed up to /mnt/data/backup with id 0
[root@8ad75214293c home]# ls -l
total 0
[root@8ad75214293c home]# touch a
[root@8ad75214293c home]# echo 'Hello the world' > b
[root@8ad75214293c home]# duplicacy_linux backup
Storage set to /mnt/data/backup
No previous backup found
Indexing /home
Packed b (16)
Backup for /home at revision 1 completed
[root@8ad75214293c home]# duplicacy_linux list -files
Storage set to /mnt/data/backup
Snapshot 0 revision 1 created at 2017-04-21 14:54 -hash
 0 2017-04-21 14:53:58                                                                  a
16 2017-04-21 14:54:06 00d0eaf33eedf0a80f68d75b6c4b8fe20be8ed2a978798458db30271608c4326 b
Files: 2, total size: 16, file chunks: 1, metadata chunks: 3
[root@8ad75214293c home]# duplicacy_linux cat -r 1 a
File a has mismatched hashes:  vs 0e5751c026e543b2e8ab2eb06099daa1d1e5df47778f7787faab45cdf12fe3a8
File a is corrupted in snapshot 0 at revision 1
[root@8ad75214293c home]# duplicacy_linux cat -r 1 b
Hello the world

[root@8ad75214293c home]#

Now the case :

[root@8ad75214293c home]# rm -r .duplicacy ; rm -r /mnt/data/backup ; mkdir /mnt/data/backup
[root@8ad75214293c home]# duplicacy_linux init 0 /mnt/data/backup
/home will be backed up to /mnt/data/backup with id 0
[root@8ad75214293c home]# ls -l
total 0
[root@8ad75214293c home]# touch a
[root@8ad75214293c home]# duplicacy_linux backup
Storage set to /mnt/data/backup
No previous backup found
Indexing /home
Backup for /home at revision 1 completed
[root@8ad75214293c home]# duplicacy_linux list -files
Storage set to /mnt/data/backup
Snapshot 0 revision 1 created at 2017-04-21 15:01 -hash
0 2017-04-21 15:01:31                                                                  a
Files: 1, total size: 0, file chunks: 1, metadata chunks: 3
[root@8ad75214293c home]# duplicacy_linux cat -r 1 a
runtime error: index out of range
goroutine 1 [running]:
runtime/debug.Stack(0x22, 0x0, 0x0)
        /usr/local/go/src/runtime/debug/stack.go:24 +0x79
runtime/debug.PrintStack()
        /usr/local/go/src/runtime/debug/stack.go:16 +0x22
github.com/gilbertchen/duplicacy.CatchLogException()
        /Users/chgang/zincbox/go/src/github.com/gilbertchen/duplicacy/duplicacy_log.go:166 +0xf4
panic(0x8c7de0, 0xc420010080)
        /usr/local/go/src/runtime/panic.go:458 +0x243
github.com/gilbertchen/duplicacy.(*SnapshotManager).RetrieveFile(0xc420512000, 0xc4200ba2d0, 0xc42007a100, 0xc4200efbd8, 0xc42007a100)
        /Users/chgang/zincbox/go/src/github.com/gilbertchen/duplicacy/duplicacy_snapshotmanager.go:1041 +0x6c5
github.com/gilbertchen/duplicacy.(*SnapshotManager).PrintFile(0xc420512000, 0xc420011f1f, 0x1, 0x1, 0x7ffc1037c918, 0x1, 0x0)
        /Users/chgang/zincbox/go/src/github.com/gilbertchen/duplicacy/duplicacy_snapshotmanager.go:1120 +0x360
main.printFile(0xc4200e65a0)
        /Users/chgang/zincbox/go/src/github.com/gilbertchen/duplicacy/main/duplicacy_main.go:819 +0x3b0
github.com/gilbertchen/cli.Command.Run(0x949ede, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x969419, 0x53, 0x0, ...)
        /Users/chgang/zincbox/go/src/github.com/gilbertchen/cli/command.go:160 +0xacd
github.com/gilbertchen/cli.(*App).Run(0xc4200e6360, 0xc42000a0f0, 0x5, 0x5, 0x0, 0x0)
        /Users/chgang/zincbox/go/src/github.com/gilbertchen/cli/app.go:179 +0x919
main.main()
        /Users/chgang/zincbox/go/src/github.com/gilbertchen/duplicacy/main/duplicacy_main.go:1639 +0x4722
[root@8ad75214293c home]#

I'm keeping watching for any additionnal help...
Thanks Yvan

Sorry if my english is not very clear.

dbus-daemon launched but not terminated

I use duplicacy on a server running Ubuntu 16.04. I noticed that there are a couple of hundred of dbus-daemon processes on the server:

ps -fu root | grep dbus-daemon | grep -v grep
root     32317     1  0 Sep27 ?        00:00:00 /usr/bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root     32339     1  0 Nov08 ?        00:00:00 /usr/bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root     32414     1  0 Oct08 ?        00:00:00 /usr/bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root     32443     1  0 Oct08 ?        00:00:00 /usr/bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root     32578     1  0 Oct23 ?        00:00:00 /usr/bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
[...]
ps -fu root | grep dbus-daemon | grep -v grep | wc -l
773

After some investigation, I found that duplicacy-cli is the one that create them (via dbus-launch), but never kill them.

To reproduce, you can just run duplicacy-cli -h and see that each time there is an additional dbus-deamon process.

This causes memory usage of the machine to increase and will eventually prevent any new process to be created.

Support direct peer-to-peer connections via TCP hole-punching

Duplicacy allows peer-to-peer backup as long as the recipient is willing to set up an SFTP server, do the work of setting up port forwarding on their router for it, and expose that SSH daemon to the Internet for anyone else to try (hopefully unsuccessfully) to break into.

It would be great if instead Duplicacy supported direct peer-to-peer connections via TCP hole-punching for users whose NATs support it. That would eliminate the setup work required for users who might be less tech-savvy and might struggle to configure (without mistakes that could leave them vulnerable) the SFTP daemon plus the port forwarding on the router, and it would avoid opening a well-understood (though generally well-secured) door which any bad actor on the Internet could try to break into.

There are definitely some costs to implementing this: clearly there's a development cost (you have to implement the actual hole-punching code to get a TCP connection, and you have to design - and secure - a protocol of some sort for doing authentication and then the transfer of data across the TCP connection), and there's also an ongoing operational cost of having to run a server of some type for facilitating the hole-punching handshake (and you have to design the protocol that lets users match one another via your server prior to actually connecting to one another). But it would make for a very seamless user experience for a feature for which at least some people would want (as evidenced by the number of us who ran CrashPlan peer-to-peer on our Raspberry Pis before Code42 released code that ran only on Intel processors), so I hope it's something you'll consider.

Stop at filesystem boundaries

It would be nice to have a way to stop duplicacy from descending into /proc without the kludge of explicitly excluding /proc. Having a flag to stop at filesystem boundaries seems the obvious solution.

Slow backup on Raspberry Pi 2 model b [Previously: Alternate compression settings for same storage per repo]

Is it possible to have different compression settings per repo for the same storage?

For example, if I have two machines backing up to the same storage, but one has an under-powered processor, is it possible to have that lower spec machine use less demanding compression whilst the more powerful machine can still use higher compression.

This is because I am seeing extremely slow backups from a raspberry pi and am assuming the compression is to blame.

Request: flag for silent backup if nothing fails

I use this awesome backup program to backup my minecraft world every 10 minutes and every time I get an mail on the server from the cronjob with the output. I would like to see a flag to disable output if now error occurs.

Suggestion: Easier navigation of files in snapshot

When using list -files it would be useful to be able to pre-filter the path(s) to list. I imagine this could improve the speed at which the list is generated also.

For example: duplicacy list -files some/path/in/set would limit to only show paths starting with some/path/in/set.

Coupled with a depth option this would make it very easy to navigate / browse a backup snapshot directly using the command line without having to post process the entire list.

I'm imagining something like:

$ duplicacy list -files -depth 0
apps/
data/
scratch/

$ duplicacy list -files -depth 0 apps/
vscode/
atom/
diskitude.exe
...

Feature request: pre- and post-backup scripts

It might be helpful to users to be able to specify a shell script to run before and after backups. (IE, to mount or unmount a remote filesystem prior to backup) I can do this by putting the entire backup operation in a shell script, but then I lose the GUI.