RSYNC as a Backup Server: Simple, Powerful and Flexible

| 9 min read

Summary

I will talk about using RSYNC for backing up files (notes, media, etc) to a server from multiple devices. I will talk about its benefits, how to do it, and examples of its extensibility and flexibility.

Pre-requisites

No pre-requisites to read and understand this article. However, to actually implement this, you do need some kind of self-hosted server, and it is preferred you have some familiarity with the command-line or basic system administration (but you can probably get away without it if you are determined).

Introduction: Why Have a Backup Server?

With our increasing reliance on digital data, resiliency against accidental or malicious loss has become increasingly important. No one likes to lose their photos full of memories that they hold dear!

Saving your data on just a single device (for example, your smartphone) without backups puts you at risk of losing this data. This can happen for a number of reasons:

  • hardware failure
  • losing the device
  • disasters (electric surge, fires or floods).

The Solution? Backing up your data and replicating it to another device!

Now, my post is not talking about a typical backup solution. Typical backup solutions are usually:

  • Third party services: Google Drive, Apple iCloud, Dropbox, etc.
  • full-featured self-hosted services: NextCloud, SeaFile

Instead we will talk about another method using RSYNC, which is a self-hosted solution granting you many benefits in flexibility, customization, and increased control, at the expense of being more DIY solution. We will explore that in detail shortly!

With that out of the way, let's talk more about RSYNC.

What is RSYNC?

Basic Usage

RSYNC is "a fast, versatile, remote (and local) file-copying tool" according to its manual page.

In the most simple case, rsync is similar to the cp unix utility: it can copy a file from one path to another.

rsync /path/to/file /path/to/destination

Remote Transfer Capability

One of rsync's unique capabilities is its ability to copy or transfer files over a network connection.

This is crucial for our backup system, since we will be backing up files to a separate device, and one of the best way to do this is network transfer.

rsync is required to be installed on both devices.

There are two ways rsync can transfer files remotely:

  1. One device must be running the rsync daemon, and the second device runs the rsync command
  2. One device runs the rsync command and uses SSH for the connection. The only requirement is that the second device has rsync installed and ssh working.

For more information and details on how to transfer files remotely, please reference rsync's manual page or the many resources on the internet going over this.

RSYNC Replicates Filesystem State

Although basic usage shows rsync to act very much like cp, it can be a lot more powerful. I found it very helpful to think of rsync as a tool to replicate the filesystem state (or part of said state) between one path and another (on one or two different devices) rather than just a basic copying tool. This is different than the working mechanism of the cp utility. This may not make sense yet, so let me demonstrate.

Suppose I have two different paths. One path contains a set of files, the other contains another set of files, and but there are some similarities:

ls /path/to/dir1
#output:
file1 file2

ls /path/to/dir2
#output:
file2 file3

as we can see, file2 is present in both directories, while file1 is only present in dir1, whereas file3 is only present in dir2

If we use the cp command such as in:

cp --recursive /path/to/dir1/. /path/to/dir2/

ls /path/to/dir2
# output
file1 file2 file3

the cp command copies all files in dir1 to dir2, and any similar files are overwritten.

Here is what rsync could have done instead:

  1. Only copy files from dir1 that are not in dir2 (no overwriting). Ignore any files that are present in both directories (even if content is different!)
  2. Only copy files from dir1 that are not in dir2. For files that are present in both directories, only copy it over if the content is different. You can be way more granular than this, such that it is only copied over if the edit timestamp is more recent, for example.
  3. Delete files in dir2 if they are not present in dir1.

This is part of what I mean by "replicating state" of a filesystem. It is not only copying files, it is defining a specific state that the filesystem should be in.

And there are many more options! (you can find them all in the rsync manual page). You can ask rsync to copy and preserve file ownership, edit timestamps, file permissions, or you can ask it to change file permissions. RSYNC grants you very fine-grained control over the process, and the possibilities are endless!

I did not go too in-depth here, as my goal is only to quickly showcase the powers of RSYNC and convince you it is fit for this job. There are many resources on using RSYNC out there, including its detailed manual page, so I urge you to check them out.

RSYNC's Diffing Algorithm and Network Performance

As we saw in the previous section, RSYNC is able to skip over copying certain files based on certain conditions (they are already present in the destination path, or their content is similar, or it has not been edited since last sync, etc).

RSYNC utilizes a diffing algorithm to achieve this that is very fast (and configurable). This is very useful for our use case. Suppose you have a photo gallery on your smartphone that you want to backup and sync to your backup server. If every time you ran a backup, it copies EVERY FILE over the network, it will take a while and that is not efficient. RSYNC can instead only copy over files that are not already present on the server. This way, it transfers much less files, and the process is much faster.

Given RSYNC has fairly low overhead, this makes the transfers quite fast.

Why RSYNC? Why not NextCloud or Other Self-Hosted Solutions?

RSYNC is not the best option for everyone, and other alternatives may very well be a more appropriate option for you.

Reasons not to use RSYNC

  • it is a backup solution only. It cannot help you browse your backed up files, delete them, share them, etc.
  • more DIY than alternatives (if you don't like DIY)
  • Primarily CLI-based (if you don't like CLIs)
  • can lead to issues if mis-configured
  • By default, may not have a feature you want, but its customizability makes any missing feature easier to DIY

It is important to note that for many people, those are pros, not cons.

Why RSYNC?

  • Focuses on one task and optimizes it really well (Unix philosophy)
  • fast and performant
  • versatile, flexible, and rich with features and CLI options
  • Built-in support for SSH for traffic encryption and remote user authentication
  • widely available with minimal requirements. Pre-installed on many linux distributions!
  • has a diffing algorithm, allowing to only transfer files that aren't already transferred
  • the directory structure of transferred files is under user's control

Why not SyncThing?

Syncthing is an open source remote file sync program. Syncthing can run on multiple devices and synchronize files in one or several directories across several devices (no need for a centralized server).

Syncthing is great! It is not wrong to use Syncthing, it all depends on your needs. While it has a lot of overlap with rsync pertaining to our use case, there are some subtle differences:

  • Syncthing is replicating the exact state of a directory between different devices. RSYNC grants you fine-grained control on what to copy, as discussed in more detail earlier
  • With Syncthing, deleting a file on one device deletes them in all synced devices. RSYNC has that behavior as optional, so you can prevent accidental deletions.
  • Syncthing is encrypted using TLS, and has strong authentication. RSYNC can optionally be encrypted and authenticated with SSH, but otherwise needs a custom solution for encryption and authentication.
  • Syncthing has a lot of features, such as version control, that RSYNC does not have included and would need a separate program for.
  • RSYNC is much more scriptable and interoperable with other Linux programs and utilities.

Now let's get down to business and implement our RSYNC backup system!

Basics Architecture for RSYNC Backups

Here is what we need:

  • a server with rsync installed (and either ssh daemon or rsync daemon running)
  • a device with data it wants to backup, and rsync installed
  • a network connecting both devices (this could be your home wifi)
  • invoke rsync to backup! (or set it up to invoke automatically)
  • ... profit?

Wait, is it really that simple? that's it? yeah! Well, kinda ... This is more of a basic (but working!) example. It is quite sufficient, but also leaves you a lot of room to add onto it. First, we will follow those steps and setup something basic, then explore some options for customizations or add-ons (though I can't go through all of them, because the possibilities are endless!)

Installing RSYNC

Linux

If you are using Linux, there is a decent chance rsync is already installed. Just try:

rsync --version

if it does not error, you're in luck!

Otherwise, you can install it using your distribution's package manager. Some examples below:

# Debian, Ubuntu and some of their derivatives
sudo apt install rsync

# Arch and derivatives
sudo pacman -S rsync

# Gentoo
emerge rsync

I bet it is possible to install it with other means. You can probably find resources on this on the internet.

Android using Syncopoli

There are two ways I know to run rsync on Android: Syncopoli and Termux.

Syncopoli is a an android native client for the rsync protocol.

  • Has a GUI
  • can schedule the triggering of the rsync job
  • has most of rsync's features, but is not one-to-one with native rsync, so some features (most of them are uncommon) are not available.

Android using Termux

The second way to use rsync on android is using Termux.

Termux is amazing. It is a terminal emulator for Android, and allows you to explore the world of the Linux terminal from your android device, and this includes rsync.

  • CLI instead of GUI. This means it requires more keyboard, but that can be automated.
  • Access to many other linux programs and utilities, and can easily chain rsync with them.
  • Can be scheduled with crontab and tasker
  • You get the same rsync as the one on Linux with all its features

Other

You can probably find installation guides for other platforms. I will not include them here. The android one is a bit difficult to find, so I thought it would be good to include here.

Configuring Server-Side RSYNC

There are two basic ways to setup RSYNC on the server side:

  • Server must have SSH Daemon setup and be SSH-able
  • Server must have rsync daemon running

Both methods have pros and cons.

SSH

  • SSH grants you SSH encryption and authentication by default for the rsync, which is a big security benefit.
  • SSH requires the client have access to a user that has shell access. Depending on your security model, this may be okay, or may be less secure.
  • SSH may be slower due to the encryption
  • requires no rsync-specific setup. If ssh is working and rsync is installed, clients can begin using immediately.

RSYNC Daemon

  • Has many features that are otherwise unavailable (such as chroot)
  • does not require shell access, which can be a security benefit depending on security model
  • transfers unencrypted. This can be mitigated in many ways, such as with secure tunnels (SSH tunnel, TLS tunnel, etc), but that is a more complex setup
  • weak authentication with weakly encrypted password exchange. Can also be mitigated like above
  • requires a dedicated daemon to be running in the background.

Setting up SSH is out of scope for this post. Many readers have it setup already, and there are many guides on the internet to do it.

Setting up RSYNC Daemon

To setup the rsync daemon, you first need to setup your rsyncd.conf file to configure the rsync daemon.

Here is an example rsyncd.conf file with some descriptive comments:

chroot = false

[modulename]
path=/path/to/destination
read only = false

[module2]
...

In the example above, [modulename] defines a module that clients can interact with. You can have multiple modules. a module is usually associated with one path for clients to rsync with, and a specified set of rules (such as our read only = false). Any rules defined under the [modulename] only apply to that module. Rules that are defined above all the modules apply to all the modules (such as our chroot).

I highly recommend exploring the manual page for rsync and rsyncd.conf. There are many options that you will find useful.

Before jumping to the next section, make sure you create the directory where the client will sync data to, if it does not exist already.

Invoking from Client Side (Linux or Termux)

Now that we have rsync installed, we can invoke it!

If your server is using SSH, then we can invoke rsync like:

rsync [options] /path/to/local/directory remoteuser@hostname:/path/to/remote/directory

If your server is using rsync daemon, you can instead do:

rsync [options] /path/to/local/directory rsync://hostname:port/modulename

There are other ways of invoking rsync that are documented in the manual pages.

What should we put in options? Well, that is up to you. One very common option is the --archive, which combines several of the most commonly used flags. It copies over all the files in the source directory that are either not present in the destination, or have different content in the destination. Any files that are identical are not transferred.

Conclusion and Future Considerations

Well there we have it! We have made a basic backup setup with rsync and discussed its benefits and weaknesses. This setup, although basic, is still very powerful and capable, but there is still a lot that we can improve:

  • Scheduling: This can be done with crontab, systemd (linux) and Tasker (for android)
  • Trigger by file watch: Instead of invoking rsync at time intervals, you can use a file watcher that invokes rsync when a file is changed. This is particularly useful for notes. This is doable on linux but I am not sure if it is possible on Android.
  • Viewing Remote Files: We can use programs to view our files on the server from a remote client. here is a list of self-hostable image & video gallery programs
  • Use unprivileged user (security enhancement): this can be done by using a locked down user on the server side, so that the client cant mistakenly do too much damage
  • Run RSYNC Daemon with unprivileged user: many run rsync daemon with root. Some (like me) run it with an unprivileged user. This means you loose the chroot feature, but a potential vulnerability in rsync would be less destructive.
  • Use secure tunnel with RSYNC Daemon: As we discussed, rsync daemon is unencrypted and uses weak authentication. This is usually less of a problem for local network transfers, but you can use secure tunnels to mitigate it. Whether it is VPN, SSH tunnel, or other options, rsync can do them all! I might write a post about this as information about it is a bit scarce on the internet.

Thank you for reading! This is one of my first posts, so please give me feedback if you have it.