Channels ▼
RSS

JVM Languages

Signalling Integer Overflows in Java

Source Code Accompanies This Article. Download It Now.


Frederic and François are members of the computer science department at the University of Applied Sciences of Western Switzerland. They can be contacted at francois.kilchoer@hefr.ch and frederic.bapst@hefr.ch, respectively.


When computer scientists manipulate numbers, they know that from a mathematical point of view their numbers sometimes behave strangely. For programmers, it's no surprise that a floating-point number may not vary when it is increased by 1.0, or that adding two positive integers can lead to a negative result. The latter phenomenon is known as "integer overflow"—the fixed-length representation of signed integers (that is, 32 bits) leads to a model of so-called "circular arithmetic", in which the greatest positive integer is followed by the smallest negative integer.

Most CPUs usually maintain an "overflow flag" that can be used to detect this anomaly. On the other hand, most popular programming languages (C++, Java...) define arithmetic operations so that an overflow is never signalled. That's why integer overflow is known to be an insidious source of bugs. Of course, overflow is, in fact, an announced feature, and can be used deliberately to obtain interesting results (for instance, to perform a modulo operation inside a random-number generator). But most of the time, when programmers use integers, they would prefer not to be threatened by overflows. If nothing else, it would be useful if overflows could automatically be reported during development and testing.

Instrumentation

To detect an overflow at runtime, one way is to replace every arithmetic operator by a call to a method that performs the equivalent operation, but also checks whether the result suffers from overflow. This is illustrated in Example 1 for the addition of ints. This kind of transformation could be automated at the source-code level, but then you need a full parser for Java. In fact, it is easier to apply the replacement after compilation at the bytecode level, which has a much simpler structure, where every integer operation directly corresponds to a particular bytecode instruction.

<b></b>

(a)  
int a, b, r;
 ...
r = a + b;
 ...


<b>(b)</b>
int a, b, r;
 ...
r = checkedIADD(a,b);
 ...


<b>(c)</b>
package utils;
public class SecuredArithmetics {
  static int checkedIADD(int a, int b) {
    int r=a+b;
    if (( (a^r) & (b^r) ) < 0)
      System.err.println("Overflow!");
    return r;
  }
}
// (a^r)&(b^r))<0) is a shorthand for:
// (a<0 && b<0 && x>=0) ||
// (a>0 && b>0 && x<=0) )

Example 1: Adding overflow checking: (a) before, (b) after instrumentation, (c) possible detection algorithm.

Understanding the instrumentation requires an informal description of the Java Virtual Machine (java.sun.com/docs/books/jvms). Local data and partial results are stored in a frame. Every time a method is invoked, a new frame is created, and destroyed upon method exit (whether normal or not). Frames are allocated on the stack; for our purposes, each frame has its own array of local variables and an operand stack. The array of local variables is organized as 32-bit variables, regardless of the type of the variables.

The array of local variables is determined at compile time. The type of a local variable is one of boolean, byte, char, short, int, float, reference (arrays and objects), or returnAddress; for longs and doubles, a pair of consecutive local variables are required.

The size of the operand stack is also determined at compile time, and this size is included in the code associated with the method. When the frame for the method is created, the stack is initially empty. The JVM provides instructions to load values from local variables or constants into the stack; other instructions take operands from the operand stack, perform the required operation, and push the results back on the stack.

For example, the iadd instruction adds two integer values. It requires that two int values be present on the stack, pushed there by other instructions. Both values are popped off the stack, added, and the sum is pushed back on the stack. A class verifier ensures that attempts to operate on values inappropriate for the instruction are caught. For example, it is illegal to load a long on the stack and attempt to use it as two ints.

As the term "bytecode" suggests, the JVM model defines operation codes (or opcodes) as single bytes, thus limiting the number of instructions the virtual machine can provide. The operations that are provided are not, therefore, orthogonal: there is no add instruction for shorts or bytes. Instead, to perform addition on bytes or shorts, the value must first be converted to int, and only then can the addition be performed. Converting shorts or bytes to integer "widens" the integer without loss of data.

The JVM does not indicate overflow of integer operations (the only exception that JVM generates as a result of integer arithmetic operations is ArithmeticException when dividing by zero). So undetected overflow can occur in several places:

  • The arithmetic instructions add, sub, mul, div, and neg (the last two for the special cases of dividing MIN_VALUE by -1 or negating MIN_VALUE, respectively).
  • During narrowing operations from int to short or byte.
  • During increment operations.

Instrumenting the code therefore involves adding code to the following instructions:

  • Arithmetic instructions. iadd, isub, imul, idiv, ineg, ladd, lsub, lmul, ldiv, lneg, iinc.
  • Narrowing instructions. i2b, i2s, l2i.

We consciously decided that left shifting would be left alone—we think that you are more likely to be aware of possible overflows in this case.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video