Advanced Database Syncing

Are you the 1%? That is, does mysqldump  just not cut it for your database? Maybe you have a million transactions an hour, or a 200GB database. Or a 200GB table.

Well, no matter which way you slice it, copying that database is going to take some time and effort. And, there are a quanity of tools designed to help you.

Mysqldump

Wait one gosh darn second there. You again? Yes, mysqldump is more than full database dumps.

Let’s assume for a moment that you interestingly programmed your 300GB database to have archive tables. Data from time immemorial (immemorial except for datestamps) stored in various named tables, rotated out judiciously at December 31st every year:

These tables are archived, and are assumed read-only. Once you copy them once for testing, why copy them again? In this example, a 300GB database trims most of its data, and the tables that need to be synced take up a dozen or fewer GBs, which sync in 30-40 minutes or so:

If you can intelligently reduce data for your final sync, you will in turn reduce downtime significantly; I might expect a 300GB database like this to take 24 hours or more with traditional methods.

XtraBackup

This is a set of open source tools by the Percona team, most commonly used through the innobackupex  wrapper. If you are making a full clone of a MySQL installation, this is your go-to command. I generally only see this used to create a full hot copy backup of an installation for setting up a slave server (or just as backups, which are always super nice to have). One advantage of using XtraBackup is the fact that MySQL will be serving the exact same content on the new server as it did on the old in every regard, right down to the grants for root. But, depending on the size of your databases, it can still take time to stream this over the network, though after that point you will only have a quick command to apply logs, and then just need to start up mysqld.

Here’s how to create your first backup:

The backup command will create its own timestamped folder inside the folder you specify; you can avoid this by skipping mkdir  and adding –no-timestamp to the end of the string. There are a number of different options you can use to create this backup; see the man page for much more info than you ever needed.

Get that over to your target server, and then get the archive ready for restore. This command will replay the logged transactions that were run while the backup was created:

Stop and move your running mysqld data_dir, restore, chown, and start:

That’s it! You are running an ACID compliant in-time image of the source server’s database engine.

Wanna get super fancy? You can skip the local backup and sync of content by streaming the backup to the target server. Percona thought to include a –stream flag:

Wanna get even MORE super fancy? innobackupex  can back up single or select databases or tables, though restoring them is a bit of a pain. See the man page for more info on this (search for Restoring Individual Tables). I won’t get into it here, but it could be applicable if you just need to copy a few tables.

All of this sounds pretty nifty, huh? Sorry, but downside: If you take a full backup of your running MySQL instance, you will still have to copy that backup to your new server, pushing 300GB over the network. At an optimal 12MB/s, 7 hours. This is better used for initial syncs, where a mysqldump would be impactful to traffic (in which case, you might consider using –parallel=4 and –rsync to use more threads, and copy non-InnoDB data using rsync instead of cp). Although, if you had a cross-mounted SAN with a gig link, you could trim that data copy timeframe quite a bit (Ah, to have such resources!). Trimming the number of tables, as in the previous section, can help, but this isn’t always possible.

Percona Toolkit

The Percona Toolkit comes with a neat tool called pt-table-sync . It’s designed to combat slave drift in master/slave setups by checking tables, databases, or full servers, creating transactions for missing data, and applying those to a server you specify. For actual master/slave setups, it applies NOOP commands to the master, which propagate to the slaves via the binlogs.

This sounds really great in theory. The disadvantage that I have encountered is that it takes a long time to perform its hashing and counting and comparison, and doesn’t do extended inserts (one row per command), so if you have a very active database, you end up with a lot of slow insert commands to be run anyway.

Replication

Yep, actually setting up a master/slave relationship between your source and target machines. It’s a bit of a pain to do, and you will have to be very careful with this one; any test transactions you perform on the target server will cause drift or duplicate keys, possibly stopping slaving. The aforementioned pt-table-sync can help in this regard. But, all the work of setting up this replication pair has a distinct advantage; no database syncing for the final sync. Just finish replaying any queued transactions, stop the master, and promote the slave.

If you have the wherewithal to establish this type of cluster, and your target server is only running mysql (no control panels, unless you have a separate remote database machine on your new setup), it can mean beautiful things for your final sync downtime, though your application/CMS connection strings may need to be altered to the new host or IP.