GithubHelp home page GithubHelp logo

ynikitenko / yarsync Goto Github PK

View Code? Open in Web Editor NEW
31.0 5.0 3.0 447 KB

Yet Another Rsync: a file synchronization and backup tool

License: GNU General Public License v3.0

Python 100.00%
backup mirroring rsync-wrapper synchronization administration-tools linux unix

yarsync's Introduction

YARsync

Yet Another Rsync is a file synchronization and backup tool. It can be used to synchronize data between different hosts or locally (for example, to a backup drive). It provides a familiar git command interface while working with files.

YARsync is a Free Software project covered by the GNU General Public License version 3.

Installation

yarsync is packaged for Debian/Ubuntu.

For Arch Linux, install the yarsync package from AUR. Packages for other distributions are welcome.

For an installation from PyPI, run

pip3 install yarsync

For macOS ventura installation, the built-in rsync in macOS is version 2.6.9, yarsync requires a new version of rsync, run

brew install rsync
pip3 install yarsync

If rsync: --outbuf=L: unknown option occurs, make sure that a new version of rsync has been installed.

Since there is no general way to install a manual page for a Python package, one has to do it manually. For example, run as a superuser:

wget https://github.com/ynikitenko/yarsync/raw/master/docs/yarsync.1
gzip yarsync.1
mv yarsync.1.gz /usr/share/man/man1/
mandb

Make sure that the manual path for your system is correct. The command mandb updates the index caches of manual pages.

One can also install the most recent program version from GitHub. It incorporates latest improvements, but at the same time is less stable (new features can be changed or removed).

git clone https://github.com/ynikitenko/yarsync.git
pip3 install -e yarsync

This installs the yarsync executable to ~/.local/bin, and does not require modifications of PYTHONPATH. After that, one can pull the repository updates without reinstallation.

To uninstall, run

pip3 uninstall yarsync

and remove the cloned repository.

Design and features

yarsync can be used to manage hierarchies of unchanging files, such as music, books, articles, photographs, etc. Its final goal is to have the same state of files across different computers. It also allows to store backup copies of data and easily copy, update or recover that. yarsync is

distributed

There is no central host or repository for yarsync. If different replicas diverge, the program assists the user to merge the repositories manually.

efficient

The program is run only on user demand, and does not consume system resources constantly. Already transferred files will never be transmitted again. This allows the user to rename or move files or whole directories without any costs, driving constant improvements on the repository.

non-intrusive

yarsync does nothing to user data. It has no complicated packing or unpacking. All user data and program configuration are stored as usual files in the file system. If one decides to stop using yarsync, they can simply remove the configuration directory at any time.

simple

yarsync does not implement complicated file transfer algorithms, but uses an existing, widely accepted and tested tool for that. User configuration is stored in simple text files, and repository snapshots are usual directories, which can be modified, copied or browsed from a file manager. All standard command line tools can be used in the repository, to assist its recovery or to allow any non-standard operations (for the users who understand what they do). Read the yarsync documentation to understand its (simple) design.

safe

yarsync does its best to preserve user data. It always allows one to see what will be done before any actual modifications (--dry-run). It is its advantage compared to continous synchronization tools, that may be dangerous if local repository gets corrupt (e.g. encrypted by a trojan). Removed files are stored in older commits (until the user explicitly removes those).

Commands

checkout
clone
commit
diff
init
log
pull
push
remote
show
status

See yarsync --help for full command descriptions and options.

Requirements and limitations

yarsync is a Python wrapper (available for Python>=3.6) around rsync and requires a file system with hard links. Since these are very common tools, this means that it can easily run on any UNIX-like system. Moreover, yarsync is not required to be installed on the remote host: it is sufficient for rsync to be installed there.

In particular, rsync can be found:

  • installed on most GNU/Linux distributions,
  • installed on Mac OS,
  • can be installed on Windows.

yarsync runs successfully on Linux. Please report to us if you have problems (or success) running it on your system.

Safety

yarsync has been used by the author for several years without problems and is tested. However, any data synchronization may lead to data loss, and it is recommended to have several data copies and always do a --dry-run (-n) first before the actual transfer.

Documentation

For the complete documentation, read the installed or online manual.

For more in-depth topics or alternatives, see details.

On the repository github, release notes can be found. On github pages there is the manual for yarsync 0.1.

An article in Russian that deals more with yarsync internals was posted on Habr.

Thanks

A good number of people have contributed to the improvement of this software. I'd like to thank Nilson Silva for packaging yarsync for Debian, Mikhail Zelenyy from MIPT NPM for the explanation of Python entry points, Jason Ryan and Matthew T Hoare for the inspiration to create a package for Arch, Scimmia for a comprehensive review and suggestions for my PKGBUILD, Open Data Russia chat for discussions about backup safety, Habr users and editors, and, finally, to the creators and developers of git and rsync.

yarsync's People

Contributors

r888800009 avatar ynikitenko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

yarsync's Issues

Clone two mount point question

Hello,

I am running an headless server running Archlinux and wants to backup one hard drive to the other one - so it needs to copy new files but also to delete files that were removed. There is also couple of folder within the hard drive that must be ignored.

So the question is, is it possible? The plan would be to write a script that is run through cronjob.
And the second question - sorry but I couldn't find the info on the wiki page - what would the command be?

yarsync clone /mnt/hd /mnt/hd-backup ? But how to tell it to ignore a folder?

Thank you for your time.

[Bug] `yarsync push` overwrite uncommitted modified files on the remote

Hello, I am looking for a synchronization tool similar to git but that can handle FS metadata (timestamp), and found your project that seems to be a good match. Now I am trying to evaluate whether it is suitable for my use case.

But when push, that some data that has not yet been committed will be overwritten.

Version

yarsync version 0.2.1

Operating System

macos ventura and archlinux

What happened

We have two computers A and B, create a repo, create a file1 and commit.
And make sure that A and B each have a copy of the repo.

Then modify file1 on B then push from A, we can see that the file1 has been overwritten.

How to reproduce

init repo

# on A
mkdir test1
cd test1
yarsync init
yarsync commit -m "Initial commit"
yarsync remote add my_remote host:/tmp/test1-repo

# on B
mkdir /tmp/test1-repo
cd /tmp/test1-repo
yarsync init

then push repo from A to B, and make file1 push again

# on A
yarsync push my_remote

echo testfile > file1
yarsync commit
yarsync push my_remote

Modify file1 on B

$ echo 'should not be overwritten' > file1
$ cat file1
should not be overwritten

Then push from A again, We can see that the file is overwritten

# on A
yarsync push my_remote

# on B
$ cat file1
testfile

What you think should happen instead

yarsync should detect and avoid overwriting data, shows that the remote has not yet committed.

If this is not the intended use, it must be aborted to avoid data loss.

improve github-pages

  • center the top button "View On GitHub" vertically,
  • create navigation between pages?
  • use hacker theme instead of midnight? Decide on its readability first. At the moment fonts are unreadable. Green sections look awesome, but maybe white is less distracting and better for reading a manual?
  • can't link to document sections (including manual and inter-manual links),
  • can't copy code with a click (as it is done on GitHub); not essential, but nice to have,
  • add an arrow "to the top of the page" at the bottom?
  • manual still looks a bit better in the terminal.

[Bug] `yarsync show` crash on AttributeError: 'NoneType' object has no attribute 'by_repos'

Version

yarsync version 0.2.1
python3.11

Operating System

macos ventura and archlinux

How to reproduce

mkdir repo
cd repo
yarsync init
yarsync commit -m "Initial commit"
echo 'test' > test
yarsync commit -m "test commit"

Then we try to get the commit id

$ yarsync log
No synchronization directory found.
No synchronization information found.
commit 1699541234
test commit

...

triggering crash

yarsync show 1699541234

output and traceback

No synchronization directory found.
No synchronization information found.
Traceback (most recent call last):
  File "/usr/bin/yarsync", line 33, in <module>
    sys.exit(load_entry_point('yarsync==0.2.1', 'console_scripts', 'yarsync')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/yarsync/yarsync.py", line 2857, in main
    returncode = ys()
                 ^^^^
  File "/usr/lib/python3.11/site-packages/yarsync/yarsync.py", line 2807, in __call__
    returncode = self._func()
                 ^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/yarsync/yarsync.py", line 2573, in _show
    self._print_log(commit, log, sync)
  File "/usr/lib/python3.11/site-packages/yarsync/yarsync.py", line 2053, in _print_log
    if commit in sync.by_repos.values():
                 ^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'by_repos'

Automatic Man Page Installation Via Setup.py

In the README, it states "Since there is no general way to install a manual page for a Python package, one has to do it manually.". I'm not sure this is entirely accurate as the SetupTools.setup function allows for a data_files argument which provides the ability to install files to another location on the system. As this StackOverflow answer states, one could use this to install manpages. I've tested this and it works on my Arch Linux system, although I haven't tested it further than that. I'm happy to make a PR including this feature, but only if you think that this is a feature you would consider in this project.

TLDR: It may be possible to automaticly install man pages via setup.py, is it worthwhile to pursue adding this as a feature?

[Feature Request] Untrack unwanted file/dir like `.gitignore`

Hi, I think it might be a useful feature if .yarsyncignore could be implemented in yarsync.

This can avoid commit and synchronize __pycache__, .DS_Store, some-program.lock, such folders and files

It seems that this feature possible implemented through --exclude-from=

Thank you

hard links don't work with ignore-existing

rsync key --ignore-existing seems to conflict with -H (hard links). Filed an issue for rsync, got no reply yet.
If this doesn't get fixed (or suggested how to fix that in yarsync), then ignoring existing files will not work. This is a nice security feature, it was introduced right before the 0.1 release.

At the moment, use --overwrite key during yarsync pull/push. Alternatively, use the hardlink executable afterwards (doesn't always work; in this case remove new commits and re-synchronize with --ignore-existing).
clone, commit and other commands work fine, because --ignore-existing is used only in pull/push.

fix pdf formatting

There are some problems with the pdf version produced by Sphinx or Read the Docs.

  • TOC. I could find no way to properly incorporate the manual into the document (it is either in the section "thanks", or I have to include it twice to get a dedicated toc entry).
  • formatting. Some strings are formatted incorrectly. This can be due to manual formatting (it is optimised for man). Will anything be improved if we use RST instead of Markdown?
  • Release notes. I would add them in a section after Advanced, but could not do it well yet.

Add rich text formatting (colours, boldness) to the output via click

UPD: click looks good for that.

Requirements for a formatting tool:

  • cross-platform (and disabled when it is not installed / supported)
  • high-level (and take care to restore original colours after the exit).

Probably some other factors should also be taken into consideration. Note, that we don't need full True Colours, but rather simply enhance user experience in several small places.

These packages look good: yachalk and colorful (the latter looks more powerful, because it supports Windows and Python 2 (if we ever use that)). There is also some comparison on StackOverflow.

Formatting could be based on how git formats its output.
Volunteers are welcome! If you take on this task, let us agree on the package first.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.