Copying Files

My favorite saying is that in computers, there’s at least three ways to do anything, and I’ve found that to be true nearly all the time. Sometimes, those three ways of doing something are even all practical!

For instance, if you want to rename a file, you can use any of these options:

mv foo bar
cp foo bar; rm -f foo
cat foo > bar; rm -f foo

Like I said, not all are practical, but there are at least three unique commands. Most of us would chose the first option and be done with it, but sometimes you need the change to be atomic, or sometimes you need constant access to the source file while it is in motion.

rsync

For migrations, there is one real swiss army knife of moving data: rsync . This gem of a tool will copy files locally and remotely over a variety of protocols, allow partial copying to pick up where you left off if the command exits unexpectedly, preserve filesystem attributes like timestamps, ownerships, and permissions, work recursively off of a single folder or from a list of files, and (most conveniently) perform an update of a copied folder to collect new or changed files.

This last feature makes rsync nearly indispensable when you are syncing data for a website migration. This allows you to make a lengthy copy of your data while a site remains active, and then a short update of the copied data later, when you can schedule a short downtime window.

When copying from a remote server, rsync is daemonized on the sender, which requires that its binary be present on both the source and the target machines. Fortunately, it is such a useful tool that you will find it in the default installation for almost every webserver, and in the case you don’t, it is easily installable through OS repositories, rpm/deb files, or source compilation in a worst case. Rsync can also downgrade its connection protocol if it finds that the remote system is not the same application version, though some features will be missing. For the most part, however, the flags you need to use will almost always be present:

-a (–archive) is a summary flag for -rlptgoD, which essentially makes a very accurate copy of a folder and its contents recursively.
-H (–hard-links) is handy for hard linked files, just in case there are any. Saves you a bit of disk space.
-P (–partial –progress) will keep partially transferred files (so you can continue them if you get disconnected), and shows a progress output during transfer.
-z (–compress) will attempt to compress files during the transfer for lower bandwidth usage (at the expense of some processing power).

There are the myriad additional options, and you can read more on the rsync man page. Most of my commands look like:

rsync -avHPze "ssh -p${port}" root@${ip}:/home/username/ /home/username/

This includes the -v flag for verbosity, and the -e flag to specify the protocol and its own flags (such as the non-default ssh port, or possibly an encryption method). If you are using the default SSH port 22 (first of all, why do you hate security this is like the number one anti-bruteforce measure) you can leave off the the -e and the following quoted ssh string, as rsync will use SSH on 22 as default

Trailing slashes, as with any linux command, are very important. In this case, the contents of /home/username/ at ${ip} will be copied into the /home/username/ directory locally. Here is the same command, slightly different:

rsync -avHPze "ssh -p${port}" root@${ip}:/home/username /home/

This copies the remote folder /home/username into /home/ locally, and the contents recursively inside that folder. Both work equally well; one is just shorter (plus it doesn’t require that the target folder exist, and it also copies the permissions for that folder).

Why am I always pulling data on the target server? Read up on Ebury.

scp

Rsync is great and all, but what if you have just one file to copy? Isn’t there a simpler command? I give to you: scp . It’s a straightforward way to copy one file or a small number of files to another machine (it’s short for secure copy, as it uses the ssh protocol).

You created a file with some data that you need to import on your new machine, such as a list of allowed IP addresses. Here’s how you might get that over to the target server using scp:

scp root@${ip}:/root/ips.txt /root/

Easy peasy. If you have a custom SSH port, you just pass that with the -P flag. You can also pass a wildcard:

scp root@${ip}:/var/named/*.db /home/temp/remote_zonefiles/

Here’s a nearly identical command with rsync:

rsync -ptgoDvPz root@${ip}:/var/named/\*.db /home/temp/remote_zonefiles/

It’s important to quote the asterisk with rsync, so it isn’t expanded locally.

lftp

I’m really reaching here. rsync is so great, I use it wherever possible. But, sometimes SSH just isn’t available and we must resort to FTP, meaning that rsync cannot spawn a remote rsync server to serve files to itself. Luckily, there is a good command line utility that we can use to copy a whole remote folder from an FTP server: lftp .

This doesn’t come by default with any server OS I’ve seen, but it is available via yum and apt repositories most every time. Most FTP programs are just about the same; they establish a connection, list files, get files, change directories, and so on. lftp is different, though, because of its flexibility in connection types and TLS negotiation, its use of common unix commands for remote browsing (like cat and find), the ability to pass local commands from the same terminal (such as !ls), backgrounding processes and jobs, passing of a string of commands when spawning the program (with -e or -c), and most importantly, mirror.

Mirror is a function of lftp that will recursively download a folder you specify to a local backup, rather than making a tarball or zip file and GETting that single file. A lot of GUI tools do this too, but mirror will also download a number of files in parallel that you specify (with -P), and can also resume a download with -c, or update a local folder with just newer files with -n. A reverse mirror is also available, if you need to push-update a local folder to a repository, but this rarely comes up in my line of work. Here is a variation on my favorite command:

cd /home/temp/sitefiles; lftp -c "set ftp:ssl-allow false; mirror -P5 -v www/" -u user ftp.domain.tld

This will connect without SSL (many hosts I deal with have it turned off for some reason), and mirror the www directory’s contents to my $PWD, in this case, /home/temp/sitefiles. Using -c instead of -e will exit once the command is complete.