Continuent Blog: Recently a Continuent Customer Asked: How Does Tungsten Replicator Handle Transient Errors When Applying to AWS Redshift?

Blog

Our team of MySQL database experts regularly blogs on topics that range from MySQL availability, MySQL replication, multi-master MySQL, and MySQL-aware proxies, all the way through to ‘how to’ content for our solutions: Tungsten Clustering, Tungsten Replicator and Tungsten Proxy.

Summary

In this blog, part of a series of “Recently a Customer Asked” posts for Tungsten University, we explore the reason for the occasional Tungsten Replicator OFFLINE:ERROR state when applying to AWS Redshift, along with possible steps to compensate for the issue.

The Question

How do we handle the occasional Tungsten Replicator OFFLINE:ERROR state when applying to AWS Redshift? We see Errno 104 Connection reset by peer messages.

The Reason

It appears that there was a transient error with s3cmd, which is used to upload files to AWS S3 as part of the process to load data into RedShift. For this specific customer, the issue was only happening when deleting files in s3.

According to the INI file, this Replicator did have auto recovery enabled, which is a good thing. The setting was configured to wait 5 minutes before attempting recovery, which is probably why this customer noticed the outage:

auto-recovery-delay-interval=300s

The Solution

The interval can be made shorter so that the Replicator attempts to go back online more quickly after an error. Just change the delay from 300 to 30 in tungsten.ini, and then run tpm update.

auto-recovery-delay-interval=30s

Deep Dive

To enable Auto-Recovery in the replicator, there are three separate options to configure in the [defaults] section of your tungsten.ini file. Here they are with the default values:

auto-recovery-delay-interval=300s
auto-recovery-delay-interval=15s
auto-recovery-max-attempts=3

auto-recovery-delay-interval - The delay between the replicator identifying that autorecovery is needed, and autorecovery being attempted. For busy MySQL installations, larger numbers may be needed to allow time for MySQL servers to restart or recover from their failure.

auto-recovery-max-attempts - Specifies the number of attempts the replicator will make to go back online. When the number of attempts has been reached, the replicator will remain in the OFFLINE state.

More Information

Auto-Recovery feature docs

Autorecovery is not enabled until the value of this parameter is set to a non-zero value. The state of autorecovery can be determined using the autoRecoveryEnabled status parameter. The number of attempts made to autorecover can be tracked using the autoRecoveryTotal status parameter.

auto-recovery-reset-interval - The time in ONLINE state that indicates to the replicator that the autorecovery procedure has succeeded. For servers with very large transactions, this value should be increased to allow the transaction to be successfully applied.

Wrap-Up

In this blog, part of a series of “Recently a Customer Asked” posts for Tungsten University, we explored the reason for the occasional Tungsten Replicator OFFLINE:ERROR state when applying to AWS Redshift, along with possible steps to compensate for the issue.

Smooth sailing!

Published In

Categories:

Advanced Replication

Series:

Tungsten University

Tags:

MySQL, MariaDB, replication, Redshift

Authors

Eric M. Stone

COO and VP of Product Management

Eric is a veteran of fast-paced, large-scale enterprise environments with 40 years of Information Technology experience. With a focus on HA/DR, from building data centers and trading floors to world-wide deployments, Eric has architected, coded, deployed and administered systems for a wide variety of disparate customers, from Fortune 500 financial institutions to SMB’s.

View All Eric M.’s Posts

Matthew Lang

VP of Customer Success, Americas

Matthew has over 25 years of experience in database administration, database programming, and system architecture, including the creation of a database replication product that is still in use today. He has designed highly available, scalable systems that have allowed startups to quickly become enterprise organizations, utilizing a variety of technologies including open source projects, virtualization and cloud.

View All Matthew’s Posts

Recently a Continuent Customer Asked: How Does Tungsten Replicator Handle Transient Errors When Applying to AWS Redshift?