Recently, we were approached with the following, seemingly minor, complaint involving Amazon’s Aurora database:
“The MySQL db was unexpectedly modified and restarted by the AWS Backup Service. It was restarted with snapshots/logging set to 0. When we saw this, we changed the retention period back and restarted the db.”
What happened, and what is the impact?
Understanding the Issue
Aurora is a managed database that runs on AWS’s RDS service. Because it is fully managed, it’s fairly easy to get started using it and configure basic functionality such as backups. Management of the database is done through a web-based management console, command line tools, or an API. There is no direct access to the underlying database, so you cannot administer it like a transitional MySQL database; you must use the available tools.
In this particular instance, the organization was using Aurora, had backups configured, and was most likely using read replicas (read-only copies of the database used for either local read databases or cross region replicas) because they had binary logging enabled. Binary logging in Aurora is required for cross region read replicas. It is off by default, but can be enabled by creating a custom DB parameter group for the cluster and changing the binlog_format
parameter. You then must reboot the database for the parameter to take effect.
The Unplanned Reboot
It appears that there was an issue with the automated RDS backups and the database got rebooted. This of course, would result in unexpected downtime while the database reboots. A reboot of an Aurora database takes minutes, so this was potentially a painful outage.
Loss of Binary Logs
The next issue is that binary logging got reset. This of course makes the read replicas useless, as they will no longer receive transactions and are immediately out of sync with the primary database.
The Necessary Reboot
The admins noticed this quickly and reenabled binary logging, followed by another reboot, again required to enable binary logging. This resulted in another outage of several minutes. If they hadn’t noticed this quickly, they would have been potentially using out-of-date read replicas. According to their description, they also would not be getting proper backups.
Rebuilding of all Read Replicas
Lastly, while binary logging was off, those transactions are lost from the perspective of the read replicas. This means that the read replicas will never match the primary, and must be destroyed and rebuilt. Rebuilding the read replicas is yet another task for the admins, and a potential interruption of service to applications that depend on the read replicas.
Where Does Continuent Fit In?
While we understand the value of cloud based managed database solutions, they have some significant drawbacks with unexpected and unpredictable behavior. Tungsten Cluster is deployed, in the cloud or on premise, using standard MySQL (or MariaDB, or Percona Server) which has predictable behavior and provides full control.
When we say we provide High Availability, it means applications stay online even in the event of a database failure, or a general host failure. You can interact with your favorite MySQL variant as you normally would, because there are no restrictions or special tools needed to administer the database.
Also, with Tungsten Cluster, changing a database parameter does not require an application outage. This incident simply would not have happened if using Tungsten Clustering.
We have done the full comparison of Tungsten Cluster to Amazon’s Aurora here: https://www.continuent.com/resources/blog/alternative-amazon-aws-aurora-mysql-tungsten-clustering-mysql-high-availability-ha.
And more specifically, deploying MySQL at geo-scale: https://www.continuent.com/resources/blog/when-aurora-global-database-not-global.
Managed database solutions do offer some form of ease and simplicity, however there is always a price (quite literally!) to pay for this. Tungsten Cluster provides the High Availability required for critical applications, and also provides the tools for ease of administration. This includes Tungsten Dashboard, a web-based GUI interface, as well as API’s and command-line tools. You can also choose where to host Tungsten Cluster: with your favorite cloud provider(s), on premises, or hybrid-cloud. Feel free to launch a Tungsten Cluster in AWS using the AMI, or contact us, and we will be happy to show you!
Comments
Add new comment