File synchronization

(Thanks to Dan Suciu for preparing this)

Unison is a file-synchronization tool for Unix and Windows, written by Benjamin Pierce from the University of Pennsylvania. Although it has a warning saying that it does not yet deal with 'resource forks' on MAC's correctly, I have been using it on MACs for several months without problems. The home page is http://www.cis.upenn.edu/~bcpierce/unison/.

Why

Unison is fantastic: it allows you to have all your files with you all the time. This means on your office desktop, on your home desktop, on your laptop that you take on the plane, on the department's Unix servers (where somebody actually remembers to back them up), or wherever you want your files to be. You can have your files on PCs, and on Linux boxes, and on Unix servers, and on Macs. And I mean you keep copies of all your files, not just the "important" ones, or the "current" ones, or the ones that you need on that trip. Of course, if you prefer you can synchronize only a fragment of your directory structure, but I see little reason to do that. There is a configuration file where you can tell unison to skip certain files or directories that make no sense to synchronize, e.g. *.tmp, or *.exe, or temp/*. I keep copies of all my files (3.5GB) on MACs, on a Windows PC, and the department's Unix file servers: it takes less than 20 seconds to synchronize a pair of computers (plus the time to actually copy any files), a little bit more from Windows. Since I started using it, it has changed the way I work at home and on the road: I always have all the files with me without planning in advance which files to copy.

Unison uses rsync to actually synchronize files, but in addition it checks which files you changed where, and makes sure it copies each of them in the right direction. For example you can modify file A at the office and modify file B on your laptop. When you synchronize, unison copies file A in one direction and file B in the other direction. If you do something stupid like modifying file A on both machines, then it will prompt you and ask you what to do.

Installation

To install it, follow the documentation, which is very well written: I will not repeat the instructions here. Keep in mind that you need to download (1) OCAML, (2) Unison itself (for the Mac: download the source, then run make following the instructions), (3) on a PC you also need Cygwin in order to have a command line ssh. Steps (1) and (2) may require you run several 'make' commands: just follow the detailed instructions. I had no problems with (1) and (2). I did have problems with (3): if you are a Cygwin novice like me, you need to know that after installing the default Cygwin package you have to install ssh in a second step. When you do this, search for the open-ssh package, not ssh (there is no ssh).

Notice that you need to install Unison (and OCAML) on each platform where you plan to run it. For example on your Mac, and in your directory on the Unix file server where you want to synchronize. Unison comes both as a command line and with a GUI (but there is no GUI available for the MAC). I prefer the command line, except that I couldn't install it on Windows, and I'm using the GUI version there.

Configuration file

After installation, you need to write your own unison configuration file, ~/.unison/default.prf. The documentation has several examples for the configuration file, but if you need more inspiration, here is what I use:

# Unison preferences file
# remote unison command
servercmd=~suciu/PROJECTS/SW/UNISON/unison

# Roots of the synchronization
root = /Users/suciu/PROJECTS-UNISON
root = ssh://barb.cs.washington.edu/PROJECTS

# Some regexps specifying names and paths to ignore
ignore = Name temp.*
ignore = Name *~
ignore = Name _region_.*
ignore = Name .*~
ignore = Name *.tmp 
ignore = Name TEMP
ignore = Name ARCHIVE

# Be fast even on Windows
fastcheck = yes
There are more advanced commands described in the documentation.

Normal operation

When you first run it, expect it to take several hours to create a copy of your entire directory. If you run it from home, expect days (about 3 days in my case, over DSL; if I were to do it again, I may consider bringing my home machine to the office instead).

During normal operation it's very fast: less than 20 seconds to traverse the entire directory, plus the time to copy any files. Its performance is full of pleasant surprises, especially over a slow link. For example if you copy 30 powerpoint files from one directory to another (which I do at the beginning of a course), then unison will not send the files over the network, but instead will simply repeat the copy command on the remote server. Instead of 10 minutes (over a DSL line), it takes only 1 second, and you may wonder if unison screwed something up: no, it didn't, it's just smart. Some hacks are probably due to rsync underneath, some may be unison's own.

When I run it I type 'unison -batch', which means accept the default recommendations automatically without asking questions; if there are conflicts, it will leave the files unchanged.

My own discipline is to use unison in a star configuration: I synchronize every personal machine with the department's Unix file sever, never two personal machines directly. I never tried to synchronize machine X with Y, then Y with Z, and finally Z with X, but unison is supposed to work that way too. You just need to write a different configuration file for each different kind of synchronization: for example, in addition to the default configuration file ~/.unison/default.prf, create a file ~/.unison/x2z.prf, which specifies a different pair of directory roots and severs, then run either 'unison' (for the default file) or 'unison x2z'.

Dan Suciu, 2003