Rsync is a Linux command-line tool that allows one to transfer files either to or from a remote host (but not between two remote hosts). The user can transfer single files or multiple files matching a pattern. In this article, I will first introduce rsync and its use in Linux, then I'll point out a method using rsync that replicates the backup process found in Apple Mac platforms called Time-Machine Backup.
Description
Rsync is a fast and extraordinarily versatile file copying tool. It can copy locally, to/from another host over any remote shell, or to/from a remote rsync daemon. It offers a large number of options that control every aspect of its behavior and permit very flexible specifications of the set of files to be copied. It is famous for its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. Rsync is widely used for backups and mirroring and as an improved copy command for everyday use.
Rsync finds files that need to be transferred using a "quick check" algorithm (by default) that looks for files that have changed in size or in last-modified time. Any changes in the other preserved attributes (as requested by options) are made on the destination file directly when the quick check indicates that the file's data does not need to be updated.
General
Let me reiterate here by saying that rsync copies files either to or from a remote host, or locally on the current host (it does not support copying files between two remote hosts).
There are two different ways for rsync to contact a remote system: using a remote-shell program as the transport (such as ssh or rsh) or contacting an rsync daemon directly via TCP. The remote-shell transport is used whenever the source or destination path contains a single colon (:) separator after a host specification. Contacting an rsync daemon directly happens when the source or destination path contains a double colon (::) separator after a host specification, OR when an rsync:// URL is specified.
As a special case, if a single source arg is specified without a destination, the files are listed in an output format similar to "ls -l".
As expected, if neither the source or destination path specify a remote host, the copy occurs locally.
Rsync refers to the local side as the client and the remote side as the server. Don't confuse server with an rsync daemon. A daemon is always a server, but a server can be either a daemon or a remote-shell spawned process.
Some Example Commands
To backup a home directory, which consists of large Linux files and mail folders, a per-user cron job can be used that runs this each day:
rsync -aiz . bkhost:backup/joe/
To move some files from a remote host to the local host, you could run:
rsync -aiv --remove-source-files rhost:/tmp/{file1,file2}.c ~/src/
Option Summary
Here is a short summary of some of the more common options available for rsync.
Options | Summary |
---|---|
--verbose, -v | increase verbosity |
--info=FLAGS | fine-grained informational verbosity |
--debug=FLAGS | fine-grained debug verbosity |
--stderr=e|a|c | change stderr output mode (default: errors) |
--quiet, -q | suppress daemon-mode MOTD |
--checksum, -c | skip based on checksum, not mod-time & size |
--archive, -a | archive mode is -rlptgoD (no -A,-X,-U,-N,-H) |
--no-OPTION | turn off an implied OPTION (e.g. --no-D) |
--recursive, -r | recurse into directories |
--relative, -R | use relative path names |
--no-implied-dirs | don't send implied dirs with --relative |
--backup, -b | make backups (see --suffix & --backup-dir) |
--backup-dir=DIR | make backups into hierarchy based in DIR |
--suffix=SUFFIX | backup suffix (default ~ w/o --backup-dir) |
--update, -u | skip files that are newer on the receiver |
--inplace | update destination files in-place |
--append | append data onto shorter files |
--append-verify | --append w/old data in file checksum |
--dirs, -d | transfer directories without recursing |
--mkpath | create the destination's path component |
--links, -l | copy symlinks as symlinks |
--copy-links, -L | transform symlink into referent file/dir |
--copy-unsafe-links | only "unsafe" symlinks are transformed |
--safe-links | ignore symlinks that point outside the tree |
--munge-links | munge symlinks to make them safe & unusable |
--copy-dirlinks, -k | transform symlink to dir into referent dir |
--keep-dirlinks, -K | treat symlinked dir on receiver as dir |
--hard-links, -H | preserve hard links |
These are only a few of the options available for rsync. For more, see the Manpage for rsync.
Time-Machine Replication Process
Time Machine is the default built-in backup feature for the Apple Mac, to automatically back up one's personal data, including apps, music, photos, email, and documents. Having this backup allows one to quickly restore their Mac device from a Time Machine backup if one ever deletes their files or, for whatever reason, can't access them.
The frequency and duration of these backups using Time Machine is that it makes hourly backups for the past 24 hours, daily backups for the past month, and weekly backups for all previous months. The oldest backups are deleted when your backup disk is full.
Laurent22 on Github has created a neat little script that replicates this frequency process to automatically backup your data using Cron in Linux, but at the same time monitors your backup device such that if it begins to become full, the oldest backups are deleted.
Laurent22's script offers Time Machine-style backup using rsync. The script creates incremental backups of files and directories to the destination of your choice. These backups are structured in a way that makes it easy to recover any file at any point in time.
The script works on Linux, macOS and Windows (via WSL or Cygwin). The main advantage over Time Machine is the flexibility since it can backup from/to any filesystem and is cross-platform. You can also backup, for example, to a Truecrypt drive without any problem.
You can find the installation instructions on how to clone Laurent22's project, the usage of his script for Time-Machine style backups, and some examples of his backup recommendations on his Github project site.
I have implemented Laurent22's script which replicates the Time-Machine-style strategy for backing up your data and created an entry in my /etc/crontab file that runs every hour. This causes the script to run and backs up my entire $HOME directory into a folder located at /media/BackupDrive/HomeDir_Backup on my system. Check out a screenshot of this process.
There is a backup.marker file that must reside in the folder where the backups are stored or the process will not run. In addition, there is one folder called latest, which is a symlink to the latest backup folder that is created. The way this works is inherent in Laurent22's script. In the figure above, even though there are 18 folders representing 18 hourly backups that have been completed with the first folder taking roughly 20 minutes to develop and taking up approximately 126GB of physical drive space, each subsequent backup takes only about 13 seconds to complete on my system. The total physical drive space taken up by the 18 folders is just over 130GB. That's because subsequent backups contain incremental backups including items that may not have existed in previous backups and all files/folders from the first full backup have pointers in the subsequent backups to the original full backup rather than actual files. The file manager sees all files and folders, but as accessible hard links back to the original full backup.
I highly encourage you to checkout Laurent22's Github project website and take advantage of his Linux script (if using Linux) that replicates Time-Machine backup's process of backing up your data using rsync. You won't use rsync by itself ever again for backing up your personal data.
- Log in to post comments