Syncing Databases

Databases are regularly the toughest things for me and clients to deal with. Possibly because I’m not a DBA and I cant write a CREATE TABLE if I had a gun to my head, but I still need to know the workings of SQL engines and how to safely, accurately, and quickly extract data (generally I can get two out of three). I’ll be referencing primarily MySQL in this article, which of course transfers to MariaDB and Percona as well. The same general principles, though, could apply to any SQL database with similar toolsets.

99% of Websites

For most websites, you can plan on having a folder and a database for each CMS install, which is usually one per domain or subdomain. Databases for most sites are in the range of a few MB to a few hundred MB, sometimes up to a GB or two. For this type of database, you can expect to use mysqldump to perform both the initial and final syncs. Its quick (-q is on by default!), easy, almost always installed, will dump exactly what you need with very few flags, and available to all linux users (meaning you can use it if you don’t have root). You don’t even need a shell; phpMyAdmin and phpPgAdmin can net you a full export in a pinch.

Here is our sample website:

root@host3 [/home/devmca/public_html]# du -h --max-depth=0
404M	.

root@host3 [/home/devmca/public_html]# grep DB_NAME wp-config.php
define('DB_NAME', 'devmca_wp902');

root@host3 [/home/devmca/public_html]# mysql -e 'select table_schema, sum((data_length+index_length)/1024/1024) AS MB from information_schema.tables where table_schema="devmca_wp902"'
+--------------+------------+
| table_schema | MB         |
+--------------+------------+
| devmca_wp902 | 2.74252701 |
+--------------+------------+

Plenty of graphics and fancy code on this wordpress site, but not a lot going on upstairs. This site is running a 3MB (not even) database, and it’s dump doesn’t take long at all:

root@host3 [/home/devmca/public_html]# time mysqldump devmca_wp902 > /dev/null

real	0m1.297s
user	0m0.014s
sys	0m0.009s

Even slightly larger databases take a very short time. On the same system (a 1GB VPS), a 130MB database dumped in 5.3s, and all databases (totalling 830MB) took 18.7 seconds. When planning for downtime, database syncs should be accounted for, but the time spent will usually primarily be your entering commands and switching between servers.

Of course, this dump was generated locally, and sent to a local file (/dev/null). How does this look over a network? Well, it depends first on the connection speed between your source and target servers. If you aren’t changing datacenters, you probably can hit 12MB/s easy (100Mbit line speed), so that’s inconsequential. Moving from New Zealand to Maryland might take a bit longer. It also depends on the specs of each machine, but this is more impactful on very large databases, that I’ll address in another post.

Doing the deed

There are two ways to attack a final sync for databases. The first is to generate a file with stdout, copy that over to the target server in a temporary location, back up your current database, if any, and then import the full dump, overwriting anything on the target server that may have been created during testing. But, this reads through the database four times, assuming you make a backup of the target database:

# Create your backup:
root@host3 [/home/devmca/public_html]# mysqldump devmca_wp902 > ../devmca_wp902.finalsync.sql

# Copy to the target server:
[root@host2 ~]# scp host3:/home/devmca/devmca_wp902.finalsync.sql /home/

# Make your backup on target, assuming this is the final sync:
[root@host2 ~]# mysqldump devmca_wp902 > /home/devmca_wp902.prefinal.sql

# Import your new data:
[root@host2 ~]# mysql devmca_wp902 < /home/devmca_wp902.finalsync.sql

These commands all take human time to type, and they have to be run on two different machines. Slow. Wrong. Bigly.

We can cut that in half by streaming a database dump over ssh. This basic piped command is run from the target server:

# First, the backup:
[root@host2 ~]# mysqldump devmca_wp902 > /home/devmca_wp902.prefinal.sql

# Then, stream the new data:
[root@host2 ~]# ssh -C host3 "mysqldump devmca_wp902" | mysql devmca_wp902

Why am I always pulling data on the target server? Read up on Ebury.

Data just automatically starts importing line by line in MySQL on the new server! This command started saving me a lot of time (did you see that we turned on compression for this plain text output with -C for ssh?), and a lot of disk space too. One doesn’t have to run up against a looming disk quota or an endless supply of temporary data to go back and garbage collect ‘later’, but there are still backups in the event you overwrite something you shouldn’t have.

Wait, that’s it?

Yeah, that’s about it. There are some edge cases that we can cover later, like very large databases, very large tables, high traffic sites, and some other items. mysqldump will do them all too, using the same techniques as above, albeit quite a bit more slowly than some other solutions.

Alright, alright. You want caveats, I can tell.

I’m sure you may have peeked at the man page for mysqldump and saw that it suggests using mysqlhotcopy . Sounds fancy, right? It’s super old, and only works with MyISAM tables. If you’re using all MyISAM anyway, why not just stop mysqld and sync the database folder? It works you know…

[root@host2 ~]# rsync -avHP host3:/var/lib/mysql/devmca_wp902 /var/lib/mysql/ --update

Because all data is stored inside the directory on the individual table’s files, and you have stopped mysqld on both machines to prevent writes, you get a clean copy. But again, this is for entirely MyISAM engine databases, as InnoDB and XtraDB store data in a shared data file (yes, even with innodb_file_per_table on). Go read more on innobackupex  the edge case databases article.

I don’t recommend doing the above, unless you can guarantee that no one or nothing else will start mysqld anywhere. Dumps are safer.

The mysqldump  command has the –opt flag on by default, which is rather handy. It adds a few key options for you: –add-drop-table, –add-locks, –create-options, –disable-keys, –extended-insert, –lock-tables, –quick, and –set-charset.

This has some clear advantages. DROP TABLE and CREATE TABLE statements will clear out the target database so you dont have conflicting primary keys. Extended insert makes the output have fewer rows with more data per row, so a smaller file size. Quick will not cache the whole transaction in memory before output (allows streaming of databases larger than your memory size). And, locks ensure that the dump is ACID compliant.

If your database dump happens to be taking a bit longer than you expect, and reads and writes are queueing up, causing your site to slow, you can append the –skip-lock-tables command, which overrides the –lock-tables flag. You won’t get a perfect table dump, but at least you will have data to test.