rsync, Part I

Security rsync makes efficient use of the network by only transferring the parts of files that are different from one host to the next. Here's how to use it securely.

Andrew Tridgell's rsync is a useful file-transfer tool, one that has no encryption support of its own but is easily ``wrapped'' (tunneled) by encryption tools such as SSH and Stunnel. What differentiates rsync (which, like scp, is based on rcp) is that it has the ability to perform differential downloads and uploads of files.

For example, if you wish to update your local copy of a 10MB file, and the newer version on the remote server differs in only three places totaling 150KB, rsync will automatically download only the differing 150KB (give or take a few KB) rather than the entire file. This functionality is provided by the rsync algorithm, invented by Andrew Tridgell and Paul Mackerras, which rapidly creates and compares rolling checksums of both files and thus determines which parts of the new file to download and add/replace on the old one.

Because this is a much more efficient use of the network, rsync is especially useful over slow network connections. It does not, however, have any performance advantage over rcp in copying files that are completely new to one side or the other of the transaction. By definition, differential copying requires that there be two files to compare.

In summary, rsync is by far the most intelligent file-transfer utility in common use, one that is both amenable to encrypted sessions and worth taking the trouble to figure out how. Using rsync securely is the focus of the remainder of this article.

rsync supports a long list of options, most of them relevant to specific aspects of maintaining software archives, mirrors and backups. Only those options directly relevant to security will be covered in depth here, but the rsync(8) man page will tell you anything you need to know about these other features.

Getting, Compiling and Installing rsync

Because Andrew Tridgell, rsync's original lead developer, is also one of the prime figures in the Samba Project, rsync's home page is part of the Samba web site, rsync.samba.org. That, of course, is the definitive source of all things rsync. The resources page, rsync.samba.org/resources.html, has links to some excellent off-site rsync documentation.

The latest rsync source code is available at rsync.samba.org/ftp/rsync, with binary packages for Debian, LinuxPPC and Red Hat Linux at rsync.samba.org/ftp/rsync/binaries. rsync is already considered a standard Linux tool and therefore is included in all popular Linux distributions; you probably needn't look further than the Linux installation CD-ROMs to find an rsync package for your system.

However, there are security bugs in the zlib implementation included in rsync prior to rsync v.2.5.4. These bugs are applicable regardless of the version of your system's shared zlib libraries. There is also an annoying bug in v2.5.4 itself, which causes rsync sometimes to copy whole files needlessly. I therefore recommend you run no version earlier than rsync v.2.5.5.

Happily, compiling rsync from source is fast and easy. Simply unzip and untar the archive, change your working directory to the top-level directory of the source code, type ./configure, and if this script finishes without errors, type make && make install.

Running rsync over SSH

Once rsync is installed, you can use it several ways. The first and most basic is to use rcp as the transport, which requires any host to which you connect to have the shell service enabled (i.e., in.rshd) in inetd.conf. Don't do this! The reason the Secure Shell was invented was because of a complete lack of support for strong authentication in the ``r'' services (rcp, rsh and rlogin), which led to their being used as entry points by many successful intruders over the years.

Therefore, I won't describe how to use rsync with rcp as its transport. However, you may wish to use this method between hosts on a trusted network; if so, ample information is available in both rsync's and in.rshd's respective man pages.

rsync Works Two Ways

A much better way to use rsync than the rcp method is by specifying the Secure Shell as the transport. This requires that the remote host be running sshd and that the rsync command is present (and in the default paths) of both hosts. If you haven't set up sshd yet, do that first.

Suppose you have two hosts, near and far, and you wish to copy the local file thegoods.tgz to far's /home/near.backup directory, which you think may already contain an older version of thegoods.tgz. Assuming your user name, yodeldiva, exists on both systems, the transaction might look like this:

[email protected]:~> rsync -vv -e ssh 
./thegoods.tgz far:~

Let's dissect the command line. rsync has only one binary executable, rsync, which is used both as the client command and, optionally, as a dæmon. In this example, it's present on both near and far, but it runs on a dæmon on neither; sshd is acting as the listening dæmon on far.

The first rsync option in the above example is -vv, which is the nearly universal UNIX shorthand for ``very verbose''. It's optional, but instructive. The second option is -e, with which you can specify an alternative to rsync's default remote-copy program rcp. Because rcp is the default, and because rcp and SSH are the only supported options, -e is used to specify SSH in practice.

Perhaps surprisingly, -e scp will not work, because prior to copying any data, rsync needs to pass a remote rsync command via SSH to generate and return rolling checksums on the remote file. In other words, rsync needs the full functionality of the ssh command to do its thing, so specify this rather than scp if you use the -e option.

After the options come rsync's actionable arguments, the local and remote files. The syntax for these is very similar to rcp's and scp's. If you immediately precede either filename with a colon, rsync will interpret the string preceding the colon as a remote host's name. If the user name you wish to use on the remote system is different from your local user name, you can specify it by immediately preceding the hostname with an @ sign and preceding that with your remote user name. In other words, the full rsync syntax for filenames is the following:

[[username@]hostname:]/path/to/filename

There must be at least two filenames. The right-most must be the destination file or path, and the others must be source files. Only one of these two may be remote, but both may be local (colon-less), which lets you perform local differential file copying--this is useful if, for example, you need to back up files from one local disk or partition to another.

The source file specified is ./thegoods.tgz, an ordinary local file path, and the destination is far:~, which translates to ``my home directory on the server far''. If your user name on far is different from your local user name, say yodelerwannabe rather than yodeldiva, use the destination [email protected]:~.

Setting Up an rsync Server

Using rsync with SSH is the easiest way to use rsync securely with authenticated users, in a way that both requires and protects the use of real users' accounts. But as I mentioned earlier in the ``SFTP and SSH'' section [of the book, see Sidebar], SSH doesn't lend itself easily to anonymous access. What if you want to set up a public file server that supports rsync-optimized file transfers?

This is quite easy to do. Create a simple /etc/rsyncd.conf file, and run rsync with the --daemon flag (i.e., rsync --daemon). The devil, however, is in the details. You should configure /etc/rsyncd.conf very carefully if your server will be connected to the Internet or any other untrusted network. Let's discuss how.

rsyncd.conf has a simple syntax; global options are listed at the beginning without indentation. Modules, which are groups of options specific to a particular filesystem path, are indicated by a square-bracketed module name followed by indented options.

Option lines each consist of the name of the option, an equal sign and one or more values. If the option is boolean, allowable values are yes or no (don't be misled by the rsyncd.conf(5) man page, which in some cases refers to true and false). If the option accepts multiple values, these should be comma-space delimited, for example, option1, option2.

Listing 1 is part of a sample rsyncd.conf file that illustrates some options particularly useful for tightening security. Although I created it for this purpose, it's a real configuration file and syntactically complete. Let's dissect it.

A Sample rsyncd.conf File

As advertised, the global options are listed at the top. The first option set also happens to be the only global-only option: syslog facility, motd file, log file, pid file and socket options may be used only as global settings, not in module settings. Of these, only syslog facility has direct security ramifications. Like the ProFTPD directive SyslogFacility, rsync's syslog facility can be used to specify which syslog facility rsync should log to if you don't want it to use daemon, its default.

For detailed descriptions of the other global-only options, see the rsyncd.conf(5) man page; I won't cover them here, as they don't directly affect system security, and their default settings are fine for most situations, anyhow.

All other allowable rsyncd.conf options can be used as global options, in modules or both. If an option appears in both the global section and in a module, the module setting overrides the global setting for transactions involving that module. In general, global options replace default values, and module-specific options override both default and global options.

The second group of options falls into the category of module-specific options.

use chroot = yes: if use chroot is set to yes, rsync will chroot itself to the module's path prior to any file transfer, preventing or at least hindering certain types of abuses and attacks. This has the trade-off of requiring that rsync --daemon be started by root, but by also setting the uid and gid options, you can minimize the amount of the time rsync uses its root privileges. The default setting is yes.

uid = nobody: the uid option lets you specify with which user's privileges rsync should operate during file transfers, and it therefore affects which permissions will be applicable when rsync attempts to read or write a file on a client's behalf. You may specify either a user name or a numeric user ID. The default is -2, which is nobody on many systems, but not on mine, which is why uid is defined explicitly.

gid = nobody: the gid option lets you specify with which group's privileges rsync should operate during file transfers, and it therefore affects (along with uid) which permissions apply when rsync attempts to read or write a file on a client's behalf. You may specify either a user name or a numeric user ID; the default is -2 (nobody on many systems).

max connections = 20: this limits the number of concurrent connections to a given module (not the total for all modules, even if set globally). If specified globally, this value will be applied to each module that doesn't contain its own max connections setting. The default value is zero, which places no limit on concurrent connections. I do not recommend leaving it at zero, as this makes Denial-of-Service (DoS) attacks easier.

timeout = 600: the timeout also defaults to zero, which in this case also means ``no limit''. Since timeout controls how long (in seconds) rsync will wait for idle transactions to become active again, this also represents a DoS exposure and should likewise be set globally (and per module, when a given module needs a different value for some reason).

read only = yes: the last option defined globally is read-only, which specifies that the module in question is read-only, i.e., that no files or directories may be uploaded to the specified directory, only downloaded. The default value is yes.

The third group of options defines the module [public]. These, as you can see, are indented. When rsync parses rsyncd.conf downward, it considers each option below a module name to belong to that module until it reaches either another square-bracketed module name or the end of the file. Let's examine each of the module [public]'s options, one at a time.

[public]: this is the name of the module. No arguments or other modifiers belong here: just the name you wish to call this module, in this case public.

path = /home/public_rsync: the path option is mandatory for each module, as it defines which directory the module will allow files to be read from or written to. If you set the global option use_chroot to yes, this is the directory rsync will chroot to prior to any file transfer.

comment = Nobody home but us tarballs: this string will be displayed whenever a client requests a list of available modules. By default there is no comment.

hosts allow = near.echo-echo-echo.org, 10.18.3.12 and hosts deny = *.echo-echo-echo.org, 10.16.3.0/24: you may, if you wish, use the hosts allow and hosts deny options to define Access Control Lists (ACLs). Each accepts a comma-delimited list of FQDNs or IP addresses from which you wish to explicitly allow or deny connections. By default, neither option is set, which is equivalent to ``allow all''. If you specify an FQDN, which may contain the wildcard *, rsync will attempt to reverse-resolve all connecting clients' IP addresses to names prior to matching them against the ACL.

rsync's precise interpretation of each of these options depends on whether the other is present. If only hosts allow is specified, then any client whose IP or name matches will be allowed to connect, and all others will be denied. If only hosts deny is specified, then any client whose IP or name matches will be denied, and all others will be allowed to connect.

If, however, both hosts allow and hosts deny are present, hosts allow will be parsed first, and if the client's IP or name matches, the transaction will be passed.

If the IP or name in question didn't match hosts allow, then hosts deny will be parsed, and if the client matches there, the transaction will be dropped.

If the client's IP or name matches neither, it will be allowed.

In this example, both options are set. They would be interpreted as follows:

  • Requests from 10.18.3.12 will be allowed, but requests from any other IP in the range 10.16.3.1 through 10.16.3.254 will be denied.

  • Requests from the host near.echo-echo-echo.org will be allowed, but everything else from the echo-echo-echo.org domain will be rejected. Everything else will be allowed.

ignore nonreadable = yes: any remote file for which the client's rsync process does not have read permissions (see the uid and gid options) will not be compared against the client's local copy thereof. This probably enhances performance more significantly than security; as a means of access control, the underlying file permissions are more important.

refuse options = checksum: the refuse options option tells the server-side rsync process to ignore the specified options if specified by the client. Of rsync's command-line options, only checksum has an obvious security ramification. It tells rsync to calculate CPU-intensive MD5 checksums in addition to its normal rolling checksums, so blocking this option reduces certain DoS opportunities. Although the compress option has a similar exposure, you can use the dont compress option to refuse it rather than the refuse options option.

dont compress = *: you can specify certain files and directories that should not be compressed via the dont compress option. If you wish to reduce the chances of compression being used in a DoS attempt, you also can specify that nothing be compressed by using an asterisk (*), as in our example.

This simple example should get you started offering files for download by rsync. Next month, we'll cover setting up rsync modules (directories) at the filesystem level to accept anonymous uploads and authenticate users.

Building Secure Servers with Linux


rsync, Part II

Security Setting up rsync modules at the filesystem level and making connections.

Last month we covered setting up an rsync server for anonymous access. Listing 1 shows the sample rsyncd.conf file from last month, illustrating some options particularly useful for tightening security. Returning to our example, here's a word about setting up rsync modules (directories) at the filesystem level. The guidelines for doing this are the same as those for anonymous FTP chroot environments. The only exception is that no system binaries or configuration files need to be copied inside them for chroot purposes, as is the case with some FTP servers.

Listing 1. Sample rsyncd.conf File

The rsync configuration file needs only a little customization of paths and allowed hosts to start serving files to anonymous users. But that's a pretty narrow offering. How about accepting anonymous uploads and adding a module for authenticated users? Listing 2 outlines how to do both.

Listing 2. Additional rsyncd.conf Modules

First, we have a module called incoming, whose path is /home/incoming. The guidelines for publicly writable directories (see ``Tips for Securing Anonymous FTP'' in Building Secure Servers with Linux) apply, but with one important difference: for anonymous rsync, this directory must be world-executable as well as world-writable, that is, mode 0733. If it isn't set this way, file uploads fail without any error being returned to the client or logged on the server.

Some tips that apply for configuring FTP are to watch this directory closely for abuse and never make it or its contents world-readable. Also, move uploaded files out of it and into a nonworld-accessible part of the filesystem as soon as possible, perhaps with a cron job.

The only new option in the [incoming] block is transfer logging. This causes rsync to log more verbosely when actual file transfers are attempted. By default, this option has a value of no. In addition, the familiar option read-only has been set to no, overriding its global setting of yes. No similar option exists for telling rsync this directory is writable; this is determined by the directory's actual permissions.

The second part of the example defines a restricted-access module named Audiofreakz. There are three new options to discuss here. The first option, list, determines whether this module should be listed when remote users request a list of the server's available modules. Its default value is yes.

The other two new options, auth users and secrets file, define how prospective clients should be authenticated. rsync's authentication mechanism, available only when run in dæmon mode, is based on a reasonably strong 128-bit MD5 challenge-response scheme. This is superior to standard FTP authentication for two reasons. First, passwords are not transmitted over the network and therefore are not subject to eavesdropping attacks. Brute-force hash-generation attacks against the server are theoretically feasible, however.

Second, rsync doesn't use the system's user credentials; it has its own file of user name-password combinations. This file is used only by rsync and is not linked or related in any way to /etc/passwd or /etc/shadow. Thus, even if an rsync login session is somehow compromised, no user's system account is directly threatened or compromised unless you've made some poor choices regarding which directories to make available using rsync or when setting those directories' permissions.

Like FTP, however, data transfers themselves are unencrypted. At best, rsync authentication validates the identities of users, but it does not ensure data integrity or privacy against eavesdroppers. To achieve those goals you must run it over either SSH or Stunnel.

The secrets file option specifies the path and name of the file containing rsync user name-password combinations. By convention, /etc/rsyncd.secrets commonly is used, but the file may have practically any name or location--it needn't end, for example, with the suffix .secrets. This option also has no default value; if you wish to use auth users, you also must define secrets file. This example shows the contents of a sample secrets file:

watt:shyneePAT3
bell:d1ngplunkB00M!

Contents of a Sample /etc/rsyncd.secrets File

The auth users option in Listing 2 defines which users, among those listed in the secrets file, may have access to the module. All clients who attempt to connect to this module, assuming they pass any applicable hosts allow and hosts deny ACLs, are prompted for a user name and password. Remember to set the permissions of the applicable files and directories carefully, because these ultimately determine what authorized users may do once they've connected. If auth users is not set, users are not required to authenticate, and the module is available over anonymous rsync. This is rsync's default behavior in dæmon mode.

And that is most of what you need to know to set up both anonymous and authenticated rsync services. See the rsync(8) and rsyncd.conf(5) man pages for full lists of command-line and configuration-file options, including a couple I haven't covered here that can be used to customize log messages.

Using rsync to Connect to an rsync Server

Lest I forget, I haven't explained how to connect to an rsync server as a client. This is a simple matter of syntax; when specifying the remote host, use a double colon rather than a single colon and use a path relative to the desired module, not an absolute path.

For instance, to revisit the scenario in last month's example, in which the client system is called near and the remote system is called far, suppose you wish to retrieve the file newstuff.tgz and far is running rsync in dæmon mode. Suppose further that you can't remember the name of the module on far in which new files are stored. First, you can query far for a list of its available modules, as shown below:

[[email protected] darthelm ]# rsync far::
public Nobody home but us tarballs
incoming You can put, but you can't take

(Not coincidentally, these are the same modules we set up in this month's examples; as I predicted in the previous section, the module Audiofreakz is omitted.) The directory you need is named public. Assuming you're right, the command to copy newstuff.tgz to your current working directory would look like this:

[[email protected] ~]# rsync far::public/newstuff.tgz .
Both the double colon and the path format differ from SSH mode. Whereas SSH expects an absolute path after the colon, the rsync dæmon expects a module name, which acts as the ``root'' of the file's path. To illustrate, let's look at the same command using SSH mode:
[[email protected] ~]# rsync -e ssh 
far:/home/public_rsync/newstuff.tgz .
These two aren't exactly equivalent, of course; whereas the rsync dæmon process on far is configured to serve files in this directory to anonymous users (i.e., without authentication), SSH always requires authentication (although this can be automated using null-passphrase RSA or DSA keys, described in Chapter 4 of Building Secure Servers with Linux). But it does show the difference between how paths are handled.

Tunneling rsync with Stunnel

The last rsync usage I'll mention is the combination of rsync, running in dæmon mode, with Stunnel. Stunnel is a general-purpose TLS or SSL wrapper that can be used to encapsulate any simple TCP transaction in an encrypted and optionally X.509-certificate-authenticated session. Although rsync gains encryption when you run it in SSH mode, it loses its dæmon features, most notably anonymous rsync. Using Stunnel gives you encryption as good as SSH's, while still supporting anonymous transactions.

What About Recursion?

Stunnel is covered in-depth in Chapter 5 of Building Secure Servers with Linux, using rsync in most examples. Suffice it to say that this method involves the following steps on the server side:

  1. Configure rsyncd.conf as you normally would.

  2. Invoke rsync with the --port option, specifying some port other than 873 (e.g., rsync --daemon --port=8730).

  3. Set up a Stunnel listener on TCP port 873 to forward all incoming connections on TCP 873 to the local TCP port specified in the previous step.

  4. If you don't want anybody to connect ``in the clear'', configure hosts.allow to block nonlocal connections to the port specified in Step 2. In addition, or instead, you can configure iptables to do the same thing.

On the client side, the procedure is as follows:

  1. As root, set up a Stunnel listener on TCP port 873 (assuming you don't have an rsync server on the local system already using it), which forwards all incoming connections on TCP 873 to TCP port 873 on the remote server.

  2. When you wish to connect to the remote server, specify localhost as the remote server's name. The local Stunnel process now opens a connection to the server and forwards your rsync packets to the remote Stunnel process. The remote Stunnel process decrypts your rsync packets and delivers them to the remote rsync dæmon. Reply packets, naturally, are sent back through the same encrypted connection.

As you can see, rsync itself isn't configured much differently in this scenario than anonymous rsync would be--most of the work is in setting up Stunnel forwarders.
Ref::
http://www.linuxjournal.com/article/6475
http://www.linuxjournal.com/article/6508

rsync Works Two Ways

It may seem odd and even confusing that rsync appears to rely on other commands to move files. Is it a file-transfer utility, or isn't it? The answer is an emphatic yes.

First, rsync can operate without the assistance of external transport mechanisms if your remote host is running rsync in dæmon mode. rsync even has its own privileged listening port for this purpose: TCP 873.

Second, remember that rsync was invented not because existing methods couldn't move data packets efficiently, but because existing methods didn't have the intelligence to determine which data packets or how many data packets actually need moving in the first place. rsync adds this intelligence to SSH and rcp without, as it were, reinventing the packet-moving wheel.


Listing 1. A Sample rsyncd.conf File

# "global-only" options
syslog facility =local5

# global options which may also be defined
# in modules
use chroot = yes
uid = nobody
gid = nobody
max connections = 20
timeout = 600
read only = yes

# a module:
[public]
path = /home/public_rsync
comment = Nobody home but us tarballs
hosts allow = near.echo-echo-echo.org, 10.18.3.12
ignore nonreadable = yes
refuse options = checksum
dont compress = *

Listing 1. Sample rsyncd.conf File

# "global-only" options
syslog facility =local5

# global options which may also be defined
# in modules
use chroot = yes
uid = nobody
gid = nobody
max connections = 20
timeout = 600
read only = yes

# a module:
[public]
path = /home/public_rsync
comment = Nobody home but us tarballs
hosts allow = near.echo-echo-echo.org, 10.18.3.12
ignore nonreadable = yes
refuse options = checksum
dont compress = *

Listing 2. Additional rsyncd.conf Modules

[incoming ]
path = /home/incoming
comment = You can put, but you can't take
read only = no
ignore nonreadable = yes
transfer logging = yes

[audiofreakz ]
path = /home/cvs
comment = Audiofreakz CVS repository
list = no
auth users = watt,bell
secrets file = /etc/rsyncd.secrets

What about Recursion?

I've alluded to rsync's usefulness for copying large bodies of data, such as software archives and CVS trees, but all my examples in this chapter show single files being copied. This is because my main priority is showing how to configure and use rsync securely.

I leave it to you to explore the many client-side (command-line) options rsync supports, as fully documented in the rsync(8) man page. Particularly noteworthy are -a (or --archive), which is actually shorthand for -rlptgoD and specifies recursion of most file types (including devices and symbolic links), and -C (or --cvs-exclude), which tells rsync to use CVS-style file-exclusion criteria in deciding which files not to copy.