GithubHelp home page GithubHelp logo

rokups / zinc Goto Github PK

View Code? Open in Web Editor NEW
48.0 7.0 3.0 384 KB

Block level data synchronization library

License: MIT License

CMake 9.48% C++ 90.52%
data-synchronization data-sync rsync zsync binary-data file-download

zinc's Introduction

zinc - the reverse-rsync

Build Status

WARNING: DO NOT USE IT WITH IMPORTANT AND/OR NOT BACKED UP DATA. This code is of alpha quality.

rsync has become a synonym to efficient data synchronization. There are few issues however: server does heavy lifting and GPL license.

There was an attempt to bring rsync fully into a client by zsync project, however it appears to not be maintained any more, it's license is also non-permissive and code of zsync is not easily embeddable into other projects. This library is a humble attempt to fix these issues.

Features

  • Block level file synchronization - downloads only missing pieces, reuses existing data.
  • No special server setup - any http(s) server supporting Range header will do.
  • Files are updated in-place - huge files of tens of gigabytes will not be copied and only changed parts will be written. Your SSD will be happy.
  • Progress reporting callbacks.
  • c++11 required.
  • Example implementation of synchronization tool written in c++.
  • Multithreaded.
  • Free as in freedom - use it however you like, in open source (preferably) or proprietary software, no strings attached.

How it works

  +-------------------------------- Server -------------------------------+
  | new_boundary_list = partition_file(new_file);                         |
  +-----------------------------------------------------------------------+
                                    |
                     [ Transport (for example http) ]
                                    |
  +---------------------------------- Client -----------------------------+
  | new_boundary_list = partition_file(old_file);                         |
  | delta = compare_files(old_boundary_list, new_boundary_list);          |
  | // Patch file                                                         |
  | for (const auto& operation : delta)                                   |
  | {                                                                     |
  |     if (operation.local == nullptr)                                   |
  |     {                                                                 |
  |         // Download block from remote file                            |
  |         auto* remote = operation.remote;                              |
  |         void* data = download_block(remote->start, remote->length);   |
  |         fp_old->seek(remote->start);                                  |
  |         fp_old->write(data, remote->length);                          |
  |     }                                                                 |
  |     else                                                              |
  |     {                                                                 |
  |         // Copy block from local file                                 |
  |         fp_old->seek(operation.local->start);                         |
  |         void* data = fp_old->read(operation.local->length);           |
  |         fp_old->seek(operation.remote->start);                        |
  |         fp_old->write(data, operation.local->length);                 |
  |     }                                                                 |
  | }                                                                     |
  +-----------------------------------------------------------------------+

On the served end new_boundary_list should be calculated once and written to a file for retrieval by client. Library is transport-agnostic and you may use any transport you desire. Http is named as a suggested transport because it eliminates need of any custom server setup and is most convenient option available today.

As you can see from diagram above process of synchronizing data is composed of three steps:

  1. Hashing: Latest version of the file is split into variable size blocks and for every block strong and weak hashes are calculated.
  2. Delta calculation: Client obtains the list of block, then splits a local file to variable size blocks and finally compares block lists of both files and determines which parts should be moved and which parts should be downloaded.
  3. Patching: Using a calculated delta map blocks in local file are rearranged much like a puzzle pieces, missing pieces are downloaded.

Example

Project comes with a testing tool zinc which is used mainly for debugging. Tool is reading and writing local files. Example below was performed in tmpfs, test files are two tar archives. new.tar contains 10 binary files 10MB each. old.tar is a copy of new.tar with one (middle) file removed and has a 10MB "hole" in the middle of file. Test performed on a i7-6800K CPU (6 core / 12 thread). Due to tmpfs you may consider test timing results as benchmark of core algorithm as file reading/writing basically happened in memory.

/tmp % sha1sum *.tar
6b9d22479a91b25347842f161eff53eab050b5d1  new.tar
71cf71c7d1433682a4b0577d982dcd5956233e7c  old.tar

/tmp % # Hash new ISO file. Produced json file is hosted on a remote (web)server along with the ISO
/tmp % zinc hash new.tar 
[########################################]

/tmp % ls new.tar*
new.tar  new.tar.json

/tmp % # Client system obtains json file with hashes from a remote server and finds different and matching blocks
/tmp % # Client system then moves existing matching blocks to their new locations while downloading missing blocks from remote server
/tmp % time zinc sync old.tar new.tar
[########################################]
Copied bytes: 51987553
Downloaded bytes: 14265606
Download savings: 87%
zinc sync old.tar   0.73s user 0.33s system 533% cpu 0.199 total

/tmp % # File was updated in less than a second
/tmp % sha1sum *.tar
6b9d22479a91b25347842f161eff53eab050b5d1  new.tar
6b9d22479a91b25347842f161eff53eab050b5d1  old.tar

Other similar software

  • rsync - inspiration of zinc
  • zsync - inspiration of zinc
  • xdelta - delta updates library
  • goodsync - proprietary block level synchronization utility

zinc's People

Contributors

rokups avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

zinc's Issues

Error with small file

Hello,

First, thanks for your work, it's a very interesting project.

I think I found a bug with when you hash small files.
I get the following error:

[terminate called after throwing an instance of 'std::length_error'
  what():  basic_string::_M_create

I understand that sync small file is not the goal of this project, but I would like to make an updater based on your library and I have small files to sync.

Cdly,
Catsy

Edit: I'm under Windows 10 compile with MinGW-W64 v7.2.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.