Channels ▼

Eric Bruno

Dr. Dobb's Bloggers

Increase Java Serialization Performance

July 30, 2013

Working with Java network IO is fairly straightforward. You use a combination of ServerSocket InputStream and OutputStream objects, and then send data back and forth between two Java applications or Java Threads. The simplest way to send and receive a Java object is to use an ObjectOutputStream/ObjectInputStream object pair, but is it the most efficient way? Let's experiment.

Java Network IO Summary

First, let's review the basics of Java network IO. To begin, you need to create a ServerSocket that sits and waits for network clients to connect. A client can be any application, even a C++ application, so long as it connects on the correct IP address and port. Here's an example you can add to any Java code to create a connection listener (get the complete code listing here):

        // Create a server socket in its own thread 
        new Thread() {
            public void run() {
                try {
                    ServerSocket clientConnect = new ServerSocket(8081);
                    while ( true ) {
                        Socket client = clientConnect.accept(); // blocks
                        
                        // A new client connected, create a listener for it
                        Listener listener = new Listener(client);
                        listeners.add(listener);
                    }
                }
                catch ( Exception e ) {
                    e.printStackTrace();
                }
            }
        }.start();

In this code, a Thread is created and started, and the work is done in the run() method. The first step is to create a ServerSocket on the localhost where 8081 is the port to listen on. You can use any port; I arbitrarily chose 8081. Next, a call to accept() blocks and waits until a network client connects on the matching IP address and port. The return is a Socket connection to the client that's used to communicate with the client. In this code, I instantiate a Listener object, which is a class I wrote to encapsulate all network communication, and add it to a collection of listeners. This code is wrapped in a while loop so that the ServerSocket will always be there ready to accept each incoming connection. Next, let's dive into the actual network communication.

Using ObjectOutputStream

Before we look at the Listener class (introduced in the code snippet above), let's look at the Sender class I wrote that takes a Java object and sends it over the network to any clients connected:

    public class Sender extends Thread {
        public Sender( ) {
            start();
        }

        public void run() {
            try {
                Socket sender = new Socket("localhost", 8081);
                if ( sender != null && sender.isConnected() ) {
                    ObjectOutputStream oos = 
                        new ObjectOutputStream( 
                            new BufferedOutputStream( sender.getOutputStream() ));
                    
                    Message msg = new Message();
                    msg.active = true;
                    msg.userid = messages;
                    msg.username = "User_" + messages;
                    msg.data = this.toString();
                    msg.type = Message.MESSAGE_TYPE_USER;
                        
                    oos.writeObject(msg);
                    oos.flush();
                }
            }
            catch ( Exception e ) {
                e.printStackTrace();
            }
        }
    }

The Sender class extends Thread and calls Thread.start() in the constructor, which means each Sender object instantiated will result in a new running thread. Within the run() method, the first step is to create a connection to the server (the code that created the ServerSocket above) on the write IP address and port. Next, the resulting Socket's OutputStream object is retrieved via a call to Socket.getOutputStream(), and passed into the constructor of an ObjectOutputStream object.

If you look closely, you'll see there's a BufferedOutputStream object in there as well. Although it's not required, using buffered IO improves efficiency and performance, as an application can write to the underlying output stream without necessarily invoking a call to the underlying system for each byte written. Later, we'll examine the performance differences with and without it.

An object can be written by simply calling ObjectOutputStream.writeObject(), followed by a call to flush() to force the bytes to be sent out over the network. In the complete sample application, included below, the Sender code will send 100,000 Message objects — an arbitrary number — in order to measure the time it takes to send and receive them.

Using ObjectInputStream

On the flip side, the Listener object uses ObjectInputStream to listen and reconstruct Java Objects as they arrive, as shown here:

    public class Listener extends Thread {
        Socket client = null;
        
        public Listener( Socket client ) {
            this.client = client;
            start(); // start the Thread
        }
        
        public void run() {
            try {
                ObjectInputStream ois =
                    new ObjectInputStream( 
                        new BufferedInputStream( client.getInputStream() ));
                
                Message msg = (Message)ois.readObject();
            }
            catch ( Exception e ) {
                e.printStackTrace();
            }
        }
    }

As with Sender, this class extends Thread, which results in a new running thread with each object instantiated. If you recall, this class was instantiated when a network client connects to the ServerSocket, with the resulting Socket connection passed into the constructor. Within the run() method, the Socket's InputStream is retrieved, and passed into the constructor of ObjectInputStream. Again, we use buffered IO (BufferedInputStream in this case) for efficiency and performance. To read an Object as it's sent, a call is made to ObjectInputStream.readObject(), which blocks until a complete object arrives over the network.

This is very straightforward, and the actual code behind sending and receiving entire Java Objects is minimal. In fact, most of the code in this example involves setting up the network connections. It's important to note that any Objects sent via Java IO must be serializable, which is achieved by implementing the java.io.Serializable interface. This interface doesn't contain any methods or fields; merely its presence is enough. Here's the Message class used in this example:

    public class Message implements java.io.Serializable {
        private static final long serialVersionUID = 1L;
        public static final int MESSAGE_TYPE_USER = 1;
        public static final int MESSAGE_TYPE_QUIT = 2;

        public Integer type;
        public String username;
        public Integer userid;
        public Boolean active;
        public String data;
    }

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Comments:

ubm_techweb_disqus_sso_-06a6faddc430162ab6c827d900667643
2013-08-08T10:04:01

Yes, thank you, I received your email. You make an excellent point, and thanks for illustrating with a complete working example. I will indeed follow this blog up with another including your excellent suggestion. Thanks again.


Permalink
ubm_techweb_disqus_sso_-06a6faddc430162ab6c827d900667643
2013-08-08T10:02:03

I understand and appreciate your argument, and I've experienced this myself. However, there's still an argument that optimization can help. This blog was derived from an experience I had building my own JMS provider, where a similar optimization increased message sending throughput tremendously. In this case, the optimization discussed paid off big time!


Permalink
ubm_techweb_disqus_sso_-8bc88d8259a6792d7971f32ec29b1f37
2013-08-08T04:16:40

For network scenario up to 60% performance boost could be gained by NOT using out of box serialization versioning. All you need to do is to extend ObjectOutputStream and ObjectInputStream and override writeClassDescriptor and readClassDescriptor. writeClassDescriptor writes just the className (instead of writing the whole metadata for the class). readClassDescriptor reads the className and uses
ObjectStreamClass.lookupAny() to build and cache the descriptor. You don't need to override writeObject and readObject (which is hard to maintain). I sent an example to Eric. May be he'll discuss it in his next article.


Permalink
ubm_techweb_disqus_sso_-f5556c08afb65764ed6cd9f93eab3cf0
2013-08-07T18:31:20

While the results in this article don't look unreasonable, I've found that these sorts of micro-optimization tests are really hard to get right. You run a program like this and get really good, consistent results. Then you run the same code in a production environment, where the JIT isn't optimizing just for your 10-50 lines of code, and suddenly your optimizations aren't doing any good at all.

My favorite example is when I tested the speed of java.util.HashSet. If you start by testing creation of large numbers of small sets (e.g. how many 10-item sets can you create in 30 seconds?) and then incrementally increase the size of the sets, you get phenomenally fast performance for small set creation. If you then reverse the order, starting with large sets and working your way down to small ones, you get completely different results.

My rule of thumb is: don't even bother to optimize code surrounding a network call. The 10ms latency introduced by each network round-trip nearly always trumps micro-optimizations such as serialization. And if you're bulk-loading 10,000 objects, you probably aren't worried about a few seconds. (Or if you are, you should probably invest in a few BitSets or primitive arrays instead of real Java objects. But of course, those add a whole new level of programming complexity.)

If I may contradict myself, I do find myself loading thousands of objects and worrying about a few seconds far more often than I'd like. I've got my own home-brew object/relational bridge which I wrote before Spring Hibernate existed, and I support tons of code that depends on it. So I have had to deal with the exact problem described in the article. Just replace the DataOutputStream with a java.sql.ResultSet. My conclusion has been that reflection is a killer if you do it for each instance, but it's easy enough to cache at the start of your bulk loading.


Permalink
ubm_techweb_disqus_sso_-06a6faddc430162ab6c827d900667643
2013-08-02T16:56:37

Good idea. In short, you're losing the convenience that entire objects are sent and reconstructed with just a few lines of code, as opposed to a line of code per data item sent in my alternative.

In terms of a mid-point, I'll need to think about that a little more. Another alternative I planned to explore is to encode the object as a byte stream (or array) and send the bytes over the wire. This requires a bit more work. The DataInputStream/DataOutputStream approach in this blog may very well be the mid-point.
Thanks!
-EJB


Permalink
ubm_techweb_disqus_sso_-3ec6365d717bde3ad3041f2b61434ca8
2013-08-02T10:42:35

Perhaps a follow up article explaining what you are losing in order to gain this performance improvement would be worthwhile.

Presumably the standard mechanism is not just 3x slower because it's bad? And perhaps there is a mid-point which offers most of the performance and most of the benefits?


Permalink


Video