Channels ▼
RSS

Tools

G1: Java's Garbage First Garbage Collector


How Does G1 Work?

Garbage-First is a server-style garbage collector, targeted for multi-processors with large memories, that meets a soft real-time goal with high probability [Detlefs04]. It does this while also achieving high throughput, which is an important point when comparing it to other real-time collectors.

The G1 collector divides its work into multiple phases, each described below, which operate on a heap broken down into equally sized regions (see Figure 1). In the strictest sense, the heap doesn't contain generational areas, although a subset of the regions can be treated as such. This provides flexibility in how garbage collection is performed, which is adjusted on-the-fly according to the amount of processor time available to the collector.

Figure 1: With garbage-first, the heap is broken into equally sized regions.

Regions are further broken down into 512 byte sections called cards (see Figure 2). Each card has a corresponding one-byte entry in a global card table, which is used to track which cards are modified by mutator threads. Subsets of these cards are tracked, and referred to as Remembered Sets (RS), which is discussed shortly.

Figure 2: Each region has a remembered set of occupied cards.

The G1 collector works in stages. The main stages consist of remembered set (RS) maintenance, concurrent marking, and evacuation pauses. Let's examine these stages now.

RS Maintenance

Each region maintains an associated subset of cards that have recently been written to, called the Remembered Set (RS). Cards are placed in a region's RS via a write barrier, which is an efficient block of code that all mutator threads must execute when modifying an object reference. To be precise, for a particular region (i.e., region a), only cards that contain pointers from other regions to an object in region a are recorded in region a's RS (see Figure 3). A region's internal references, as well as null references, are ignored.

Figure 3: A region's RS tracks live references from outside the region.

In reality, each region's remembered set is implemented as a group of collections, with the dirty cards distributed amongst them according to the number of references contained within. Three levels of courseness are maintained: sparse, fine, and course. It's broken up this way so that parallel GC threads can operate on one RS without contention, and can target the regions that will yield the most garbage. However, it's best to think of the RS as one logical set of dirty cards, as the diagrams show.

Concurrent Marking

Concurrent marking identifies live data objects per region, and maintains the pointer to the next free byte, called top. There are, however, small stop-the-world pauses (described further below) that occur to ensure the correct heap state. A marking bitmap is maintained to create a summary view of the live objects within the heap. Each bit in the bitmap corresponds to one word within the heap (an area large enough to contain an object pointer; see Figure 4). A bit in the bitmap is set when the object it represents is determined to be a live object. In reality there are two bitmaps: one for the current collection, and a second for the previously completed collection. This is one way that changes to the heap are tracked over time.

Figure 4: Live objects are indicated with a marking bitmap.

Marking is done in three stages:

  • Marking Stage. The heap regions are traversed and live objects are marked:

    1. First, since this is the beginning of a new collection, the current marking bitmap is copied to the previous marking bitmap, and then the current marking bitmap is cleared.
    2. Next, all mutator threads are paused while the current TAMS pointer is moved to point to the same byte in the region as the top (next free byte) pointer.
    3. Next, all objects are traced from their roots, and live objects are marked in the marking bitmap. We now have a snapshot of the heap.
    4. Next, all mutator threads are resumed.
    5. Next, a write buffer is inserted for all mutator threads. This barrier records all new object allocations that take place after the snapshot into change buffers.

  • Re-marking Stage. When the heap reaches a certain percentage filled, as indicated by the number of allocations since the snapshot in the Marking Stage, the heap is re-marked:

    1. As buffers of changed objects fill up, the contained objects are marked in the marking bitmap concurrently.
    2. When all filled buffers have been processed, the mutator threads are paused.
    3. Next, the remaining (partially filled) buffers are processed, and those objects are marked also.

  • Cleanup Stage. When the Re-mark Stage completes, counts of live objects are maintained:
    1. All live objects are counted and recorded, per region, using the marking bitmap.
    2. Next, all mutator threads are paused.
    3. Next, all live-object counts are finalized per region.
    4. The TAMS pointer for the current collection is copied to the previous TAMS pointer (since the current collection is basically complete).
    5. The heap regions are sorted for collection priority according to a cost algorithm. As a result, the regions that will yield the highest numbers of reclaimed objects, at the smallest cost in terms of time, will be collected first. This forms what is called a collection set of regions.
    6. All mutator threads are resumed.

    All of this work is done so that objects that are in the collection set are reclaimed as part of the evacuation process. Let's examine this process now.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video