"Failure: Trying to Control Java GC"
I recently had the brilliant idea that I could control the Java garbage collector, in an attempt to avoid it from interrupting some time-critical code. My plan was to do this by using native code, but it failed... brilliantly. The theory behind this is that GC will not begin until all running threads have entered a safe-point (more on this later). One thing that will delay this is a Java thread that is currently waiting to return from a native call (i.e., via JNI), among other things.
I began by creating a simple API with the methods pauseGC()
and resumeGC()
. The JNI implementation would have matching methods called nPauseGC()
and nResumeGC()
, where the "n" stands for "native." The theory of operation goes like this:
- pauseGC(): 1 - Create a Java thread that calls the JNI code, nPauseGC() 2 - nPauseGC() will: 2.a - Increment a counter 2.b - Block by entering spin loop until the value of the counter goes back to 0 3 - Return to the Java calling thread - resumeGC(): 1 - Call nResumeGC(), which will decrement the value of the counter in step 2.b for pauseGC above
To call native code from Java, you first need to define the calls in your Java code like this:
public class GCController { // ... // Native calls: native public void nPauseGC(); native public void nResumeGC(); }
Next, you need to generate the JNI stub code by using the javah utility found in the bin directory of your JDK installation. For this implementation, I ran the following from the command line:
javah -jni -d . -cp ./build/classes gccontroller.GCController
This tells javah to generate a JNI header, place it in the current directory, for the class GCController
in the package gccontroller. The output was the C++ header file called gccontroller_GCController.h.
Next, I implemented the API, with the methods pauseGC()
and resumeGC()
. The implementation of pauseGC()
looks like this:
public void pauseGC() { // Create a thread that calls into the native code and blocks // until signaled to return via the resumeGC method if ( paused ) { // already paused return; } new Thread() { public void run() { paused = true; nPauseGC(); // Will block until signaled paused = false; } }.start(); }
The implementation of resumeGC()
is very simple:
public void resumeGC() { // Signal the native code to unblock and return from the pauseGC call nResumeGC(); paused = false; }
The native C++ code implementation is as follows, with explanation afterwards:
#include <jni.h> #include <thread> #include <unistd.h> #include "gccontroller_GCController.h" int calls = 0; void waitWhile() { while ( calls > 0 ) { usleep(1); } } /* * Class: gccontroller_GCController * Method: nPauseGC * Signature: ()V */ JNIEXPORT void JNICALL Java_gccontroller_GCController_nPauseGC(JNIEnv* penv, jobject obj) { // Start a native thread, and wait on an object printf("nPauseGC call counter=%d \n", ++calls); std::thread r (waitWhile); r.join(); } /* * Class: gccontroller_GCController * Method: nResumeGC * Signature: ()V */ JNIEXPORT void JNICALL Java_gccontroller_GCController_nResumeGC(JNIEnv* env, jobject obj) { // Signal the native thread to resume printf("nResumeGC call counter=%d \n", --calls); }
This is very simple really. The method Java_gccontroller_GCController_nPauseGC()
— I know, JNI creates long-winded method names — increments a counter and creates a native thread that spins until the counter reaches zero. The call to join()
ensures the method will not return until the thread spin-loop terminates via a call to Java_gccontroller_GCController_nResumeGC()
.
Testing It: Safe Points
My intention was to use this implementation to try to halt the garbage collector while my code executed something time-critical to avoid it getting interrupted. To test it, I created a method that called pauseGC()
, went into a loop allocating memory until it was basically all used, and then called resumeGC()
. When I ran it with the -XX:+PrintGC
java command-line parameter, I didn't expect to see garbage collection events between my calls to pause and resume the GC, but I did. Why? Because the Java VM is smarter than I am.
My theory was based on the fact that the JVM pauses Java threads when they enter what's called a "safe point", which occurs on method returns, loop iterations, returns from native calls, and so on. Java threads pass through safe points all the time without delay. The JVM only uses these safe points to pause application threads when it needs to do something special, such as invoking the garbage collector or the just-in-time (JIT) compiler. I thought that by delaying a native call from returning back to the calling Java code, I could halt the GC. While it's true that it might delay the start of GC, it doesn't stop GC. The JVM just delays the return of your native call instead. On top of that, it can actually create what appears to your application as an even longer GC pause. This is because while all other application threads are paused at their safe points waiting for your native-calling Java thread to enter its safe point, none of your threads get any work done, and neither does the GC.
To see all of this in action for yourself, download the code, and run the test application with the following java command-line parameters:
> java -XX:+PrintGCApplicationStoppedTime -XX:+PrintSafepointStatistics -Xms10m -Xmx128m -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGC -XX:+PrintGCDetails -Djava.library.path=<path to your native library code> gccontroller.GCController
Lesson Learned: Don't Help, Control, or Avoid GC
Long ago I wrote here and in other places that you should never try to help the GC, try to avoid GC, or otherwise attempt to control GC because these attempts usually work against you. For instance, people that avoid "creating garbage" by pooling and reusing objects, thus trying to avoid the allocation of memory in the process, usually hurt the collector. This is because "garbage collection" is a misnomer, as the bulk of typical GC work is tracing and moving live objects. Creating large pools of objects creates more work for the GC, which can result in longer pauses. I should have heeded my own advice, but learning from failure is what programming is all about.
In the next blog, I'll go over other poor Java coding practices that can delay Java safe point entry, hence making it appear that GC is taking longer than expected, and how JVM implementations attempt to avoid this.
Happy Coding!
-EJB