Terracotta BigMemory and Java GC Pauses
Today I spoke with Amit Pandey, CEO of Terracotta, about their new product called BigMemory. Terracotta's business is scalability, and they offer multiple products to help their customers gain the scalability they need in their own applications, such as Ehcache, and Web Sessions. Being a commercial open-source company, their products are available as open-source with enterprise features that can be purchased on top. They've been in business for over 6 years, have hundreds of customers, and over 500,000 users of their products.
In a nutshell, Terracotta helps their customers by getting their database data in memory as close to the application as possible. As customers continue to put larger data sets into cache, they require larger and larger Java heaps. As a result, Terrcotta and their customers are spending more time dealing with Java garbage collection tuning issues. Up until recently, this wasn't a big problem since they have multiple Java GC experts on staff. They've been able to tune for Java heaps up to 10GB, but are getting requests for larger heaps--up to 100GB in some cases--where GC pauses are just too long (five minutes or longer). This is unacceptably long.
Terracotta set about solving this problem and did so by eliminating the Java garbage collector altogether. They've built a pure Java solution that creates an off-heap memory store that's not within the scope of the garbage collector, but is instead handled by their own memory manager. This memory store is used to hold the data being cached in what amounts to a very sophisticated hash map that doesn't affect garbage collection in any way. Although storing and accessing data in BigMemory is slightly less performant than in the heap, the latency and throughput savings from the GC pauses that are avoided greatly outweigh this overhead; and of course it's far better than retrieving data from a disk-backed database.
What is Garbage Collection?
John McCarthy is a computer scientist who worked in the field of artificial intelligence in the early 1950's, and beyond. He announced the Lisp programming language in a paper he wrote in 1958, and invented the concept of garbage collection in 1959, mainly as part of Lisp. In short, a garbage collector works to reclaim areas of memory within an application that will never be accessed again. No longer is it the programmer's responsibility to allocate, and subsequently deallocate an object's memory. This eliminates many, if not all, of the potential memory-related errors outlined above. At the most fundamental level, garbage collection involves two deceivingly simple steps:
1. Determine which objects can no longer be referenced by an application. This is done by either direct means (such as with object reference counts), or indirect means (where object graphs are traced to determine live and dead objects).
2. Reclaim the memory used by dead objects (the garbage).
Of course, the work to reclaim dead objects takes time, and must be executed from time to time to ensure that enough free memory is consistently made available for the application threads to consume. We call application threads mutator threads since, from the collector's point of view, they change the heap. Complexities arise in determining when, for how long, and how often, garbage collection activities are to take place. This work directly impacts the performance and determinism of the running application, as it takes time to run the associated garbage collector logic.
There are many algorithms and approaches for garbage collection. The work can be performed in parallel to application threads (which is often referred to as concurrent GC), parallel to other GC-related activities (often referred to as parallel GC), or serially. Some of the GC work can be performed at object allocation time, or all of it can be deferred until free memory falls below a threshold percentage. Java employs various types of garbage collectors and algorithms to best suit different types of applications. For more information on this topic, see my book, Real-Time Java Programming, available from Pearson as part of the Java Series.
BigMemory Benefits to GC
Some of the measurable benefits of using BigMemory for data-intensive applications with large memory demands are: -It's a pure Java solution (not a new JVM or programming model that requires changes to your application) -Fewer running JVMs required to maximize the memory utilization of your existing servers -BigMemory requires a simple two-line configuration file change to enable it with applications that use Ehcache -For database applications that use Hibernate today, BigMemory can be integrated with a simple switch -With beta customers, a performance gains and latency improvements have been in the 15x to 100x range. -100x improvement was measured with customers using disk-based databases over a LAN -15x improvement measured with customers that are using RAM-based solid-state drives (SSD). Greater performance improvements were seen for customers that use Flash-based SSDs. -For applications that formerly required large heaps up to the 100GB size range, pause times went from the five minute range down to .4 seconds maximum.
Here are some comments from Steve Harris, VP of Engineering at Terracotta, about how BigMemory works and can be used:
"BigMemory is a 100% pure Java (with no JVM spawning) tiered cache solution (onHeap, off Heap, onDisk) and runs on any 1.5 and up JVM (jrocket, ibm, sun tested) on any operating system that supports those JVMs (64 bit preferred). It has been tested to caches over 350G in size with almost no degradation in performance due to size of the off-heap in memory portion of the cache. The off heap portion is pause-less, fully concurrent and scales with CPU." BigMemory Availability
Currently, Terracotta has about 18 beta customers actively using BigMemory and ready to deploy with it. Additionally, new customers have been signing up, with the goal of having over 100 committed customers by the release date, which is planned for October 2010. Check out http://www.terracotta.org/bigmemory for more on BigMemory, and Terracotta's other products as well.