The Question
Recently, a customer asked us:
Why would heavy disk IO cause the Tungsten Manager and not MySQL to be starved of resources?
For example, we saw the following in the Manager log file tmsvc.log:
2019/06/03 00:50:30 | Pinging the JVM took 29 seconds to respond. 2019/06/03 00:50:30 | Pinging the JVM took 25 seconds to respond. 2019/06/03 00:50:30 | Pinging the JVM took 21 seconds to respond. 2019/06/03 00:50:30 | Pinging the JVM took 16 seconds to respond. 2019/06/03 00:50:30 | Pinging the JVM took 12 seconds to respond. 2019/06/03 00:50:30 | Pinging the JVM took 8 seconds to respond.
The Answer
Why might a Java application be slow or freezing? The answer is that if a filesystem is busy being written to by another process, the background I/O will cause the Java JVM garbage collection (GC) to pause.
This problem is not specific to Continuent Tungsten products!
The following article from LinkedIn engineering explains the issue very well (and far better than I could - well done, and thank you):
https://engineering.linkedin.com/blog/2016/02/eliminating-large-jvm-gc-pauses-caused-by-background-io-traffic
Below is a quote from the above article (without permission, thank you):
Latency-sensitive Java applications require small JVM GC pauses. However, the JVM can be blocked for substantial time periods when disk IO is heavy. These are the factors involved:
- JVM GC needs to log GC activities by issuing write() system calls;
- Such write() calls can be blocked due to background disk IO;
- GC logging is on the JVM pausing path, hence the time taken by write() calls contribute to JVM STW pauses.
The Solution
So What May Be Done to Alleviate the Problem?
You have options like:
- Tune the GC log location to use a separate disk to cut down on i/o conflicts as per the article above.
- Move the backups or NFS-intensive jobs to another node.
- Unmount any NFS volumes and use rsync to an admin host responsible for NFS writes (i.e. move the mount to an external host).
Again, I quote from the LinkedIn engineering article above (without permission, thank you again):
One solution is to put GC log files on tmpfs (i.e., -Xloggc:/tmpfs/gc.log). Since tmpfs does not have disk file backup, writing to tmpfs files does not incur disk activities, hence is not blocked by disk IO. There are two problem with this approach: (1) the GC log file will be lost after system crashes; and (2) it consumes physical memory. A remedy to this is to periodically backup the log file to persistent storage to reduce the amount of the loss.
Another approach is to put GC log files on SSD (Solid-State Drives), which typically has much better IO performance. Depending on the IO load, SSD can be adopted as a dedicated drive for GC logging, or shared with other IO loads. However, the cost of SSD needs to be taken into consideration.
Cost-wise, rather than using SSD, a more cost-effective approach is to put GC log file on a dedicated HDD. With only the IO activity being the GC logging, the dedicated HDD likely can meet the low-pause JVM performance goal.
Summary
The Wrap-Up
In this blog post we discussed why Java applications freeze or are slow under heavy I/O load and what may be done about it.
To learn about Continuent solutions in general, check out https://www.continuent.com/products.
The Library
Please read the docs!
For more information about Tungsten clusters, please visit https://docs.continuent.com.
Tungsten Clustering is the most flexible, performant global database layer available today - use it underlying your SaaS offering as a strong base upon which to grow your worldwide business!
For more information, please visit https://www.continuent.com/products.
Want to learn more or run a POC? Contact us.
Comments
Add new comment