Continuent Blog: Advanced rsync for Database Provisioning

Blog

Our team of MySQL database experts regularly blogs on topics that range from MySQL availability, MySQL replication, multi-master MySQL, and MySQL-aware proxies, all the way through to ‘how to’ content for our solutions: Tungsten Clustering, Tungsten Replicator and Tungsten Proxy.

Check Out Part 1

In a previous blog article, we saw how to use rsync to provision a database within a Tungsten Cluster. As a reminder, rsync is useful for syncing files or directories that differ from source to target. scp can also be used for this purpose, however scp will send all files, whether they differ or not. For databases, if the target already has an older version of a database (or even a corrupt database), in theory we could save a lot of time reprovisioning the database if we could send only the changes, rather than the entire database files.

Adding the Options

rsync normally copies whole files, based on if the file timestamps differ from source to target. For large databases, however, this can be quite inefficient. Consider a large, active database: during the course of a day, the database is updated, however at the file level, only a fraction of the blocks on the filesystem have changed. Therefore, we’d like to just transfer the changes, instead of the whole file, since the files are quite large.

In Part 1, we have the following rsync command:

copy

db2 shell$ sudo rsync -avz --progress --delete db1:/volumes/data/mysql/ /volumes/data/mysql/

As a review, this command compares the timestamps and sizes between the files in /volumes/data/mysql between db1 and db2 and will copy files that have timestamps newer on db1. When doing the actual copy, rsync copies the file to a temporary hidden file in the target directory. When the copy is complete, the old file is removed and replaced with the hidden file.

If the target already contains a database, we can simply transfer the changes from the source. This works even if the target database is corrupt. We are simply going to retain the database files that are already there and update the file blocks where necessary.

Let’s add some additional options to make this more database friendly:

`-c`	When comparing the actual files, compare based on file checksum rather than file mod date and file size. Although this is a slower comparison, database file mod times and sizes are not always updated even when actively being written to.
`--inplace`	When copying, do not create a temporary file, copy the source file directly to the target file.
`--no-whole-file`	Do not copy the whole file, copy only the changed blocks in the file.

With the above options, provided we already have a database on the target, even corrupt, that needs to be reprovisioned, we transfer only the changed blocks from source to target. Consider a large, active database - the delta between the database files between source and target is a small percentage.

Our new command line now looks like this:

copy

db2 shell$ sudo rsync -ravc --inplace --no-whole-file --progress --delete db1:/volumes/data/mysql/ /volumes/data/mysql/

Seeding a Database

Suppose we have db1 and db2, where db2 is a replica of db1. During operations, db2 somehow gets corrupted and is no longer viable. We may be tempted to take a maintenance window, remove db2, and then copy db1 from db2. However, using the method above, we can still make use of the files on db2 and simply copy the changed blocks from db1.

We still have to take a maintenance window even by using the method above, although it should be much smaller than copying the entire database. We can further reduce this window by “seeding” the database on db2: Throughout the day, we can run the rsync command even while db1 is running. This of course will not create a working database on db2, but it will send incremental changes from db1 to db2. We can do this a number of times throughout the day, and when we finally take our maintenance window, there will be even fewer changes to send from db1 to db2 because we have already seeded db2 earlier.

Multiple Threads

Comparing checksums on large files can be time-consuming. In this case, can we create multiple rsync threads to make the provision process more robust. Here’s one way to accomplish this from db1:

copy

db1 shell$ sudo "cd /volumes/data/mysql/; find . -type f |xargs -I {} -P 4 -n 1 rsync -ravc --inplace --no-whole-file --progress --delete {} db2:/volumes/data/mysql/"

In the above command, we do a find on the MySQL data directory to enumerate all of the files that make up the database. We pipe that into xargs to rsync each file, using a number of concurrent threads. In the example, we use 4 threads (as defined by the -P parameter).

Conclusion

Provisioning a database can be extremely time-consuming, and typically requires a long maintenance window. Using the techniques above, we can leverage the existing files of a failed or corrupted database to make the provisioning much quicker. In the latest version of Tungsten Cluster, the utility tprovision utilizes these techniques when using rsync as the provision method.

Published In

Categories:

Database Administration, Deployment

Series:

MySQL High Availability (HA) & Disaster Recovery (DR)

Tags:

provisioning, rsync, MySQL, MariaDB

Author

Matthew Lang

VP of Customer Success, Americas

Matthew has over 25 years of experience in database administration, database programming, and system architecture, including the creation of a database replication product that is still in use today. He has designed highly available, scalable systems that have allowed startups to quickly become enterprise organizations, utilizing a variety of technologies including open source projects, virtualization and cloud.

View All Matthew’s Posts

Advanced rsync for Database Provisioning

Check Out Part 1

Adding the Options

Seeding a Database

Multiple Threads

Conclusion

Published In

Author

Comments

Add new comment

Filtered HTML