Channels ▼


Testing Complex Systems

Testing Asynchronous Code

Asynchronous code, often found related to networking or UI, is probably the most difficult code to test. When you invoke an asynchronous operation, the flow of control returns immediately to the calling code, but the operation hasn't completed yet. In your test code, you need to wait for the operation to complete, then proceed as usual, checking the result, state, and side effects. Asynchronous APIs fall into three broad categories:

  1. Callback-based APIs
  2. Future-based APIs
  3. Queue-based APIs

With Callback-based APIs, you provide a callback function to the asynchronous operation; and when it completes, your callback function is called. Future-based APIs return an object you can query periodically to check whether the operation has completed. Queue-based APIs deposit the response in a queue you can poll.

In your test, you want to busy-wait for the async operation to complete either by setting a flag in your callback function or by directly checking the future object or queue and then carrying on as usual.

The events interfaces I favor most of the time are a form of Callback-based APIs. In my networking test, I used the following code to wait for the mind reading session to complete:

            while (_sink.OnReadCompleteCalls.Count == 0)

This is not ideal. If the code fails and OnReadComplete() is never called, the test will hang. A very common and simple approach is just sleeping for a while:

// Sleep for 5 seconds to give the operation enough time to complete

This is pretty bad, too. If the operation completes in a jiffy, you still have to wait for five seconds; and if you run a battery of tests, it can slow your test run significantly. I recently tested a lot of code where I had to kill, start, and reconfigure a remote RabbitMQ cluster in about 50 test cases. If I used hard-coded sleeps to wait for each operation to complete, I wouldn't be sitting here writing this article.

The best approach is to do periodic checks with a timeout. The pseudo code looks something like the following:

    timeout = Now + 5 seconds
    while (!ok and Now() < timeout)
        ok = is operation complete?
        sleep 100 milliseconds;
    if (ok)
        // good
        // oh-oh. operation times out

This is cumbersome code to write every time you want to wait for an operation to complete. In a future article, I will demonstrate a neat class that compresses the code to the following snippet:

  var ok = Wait.For(3000, () => { _sink.OnReadCompleteCalls.Count > 0 });

This statement will wait for three seconds, periodically checking the expression, and will bail out immediately when it is true. If it is never true after three seconds, it will exit anyway and 'ok' will be false.

Some asynchronous code relies on timers and timeouts. You should make sure these timeouts are configurable, so your test doesn't have to wait 10 minutes to discover what your code is doing when some operation times out.

Testing Multithreaded or Multiprocess Code

Multithreaded code with low-level locks and multiple threads modifying shared resources is a nightmare to test. This should give you pause. In every case I have ever encountered where multithreaded code was too complex to test, the code itself was just too complex, period. There were nasty bugs related to fine-grained locking (not to mention heroic attempts at lock-free algorithms). Consider modern approaches like message passing and share-nothing designs. If you have to manage low-level concurrency, try to minimize the surface area — code review and analyze it, and then bombard the code with fuzz testing. I don't recommend trying to mock threads. You will never be able to simulate all the ways real threads can destroy your buggy code and it will just give you a false sense of security.

Testing multiprocess code poses its own challenges. It is similar to multithreaded code in that some operations happen outside your test control flow. The difference is that it is harder to check the state of objects in other processes. There are many inter-process communication (IPC) mechanisms, and you should look into them to get visibility into the state of processes. You can also mock processes (just like you mock servers when testing networking code). If you want to wait for some operation in another process to complete, you can use a file as a lock (each process tries to get an exclusive access to the locked file).

Testing Systems Created with Multiple Programming Languages

Many complex systems have components written in different programming languages. As long as the interaction between the components is over a network using common protocols, this is not very interesting. However, sometimes multiple programming languages interact in a more direct way through language bindings. For example, many dynamically typed languages (scripting languages) like Python and Ruby provide a C extension API that allows integration with any C library. The other direction is also very common, where you embed a scripting language in a C/C++ program and allow users to script various items or write plugins in a dynamic language. For example, Lua is often embedded in game engines. JavaScript, by way of Google Chrome's V8, can also play with C. The Java Virtual Machine is a powerhouse when it comes to multiple language, and in recent years, it has become fashionable to target the JVM from many languages like Scala, Clojure, Jython, and JRuby. Java itself always had the Java native interface (JNI) to interact with C. But the poster child of cross-language development is definitely the .NET framework, which was designed from the start as a multi-language framework. Visual Studio support for multi-language development is superb and even allows you to debug and step through different languages.

The problem with polyglot systems is that error reporting is often not streamlined and failures across the language boundaries can be masked and fail quietly (or the program can just exit). To test such systems, you need to adopt a systematic approach to collect information form the both sides of the language boundary. Designing language bindings is a black art. Marshaling data types is often the main culprit. If you ever tried to Swig a cool C++ class with a bunch STL types and custom templates, you would never forget it. Swig can generate C/C++ bindings to almost any language and is very powerful. But as the saying goes: "With great power comes great confusion." If you use Swig/C++, I highly recommend you keep the bindings themselves as just a thin veneer and avoid any extension.

Testing polyglot systems with a core C/C++ engine and a layer of bindings is best done in layers. In general, the native code should be completely agnostic to the fact that it can be accessed from other languages. You should be able to unit test the core engine in C/C++. If you have a layer of scripting code (say, Python) that exercises the native code via bindings, consider mocking the native code. This is usually very easy in dynamic languages via monkey patching.

Sometimes, the polyglot nature of the system plays to your advantage. Writing and running tests is much easier in Python than in C++, and you can find many fine Python unit test frameworks. You don't have to compile and link your tests, and you can even explore your objects in a REPL.

Testing Failure Code Paths and Error Handling

One of the great challenges in complex systems is testing failures and error conditions. The major difficulty is that for every successful interaction, there is usually a single path through the system: an input event arrives (network packet, button click, a message from another process, a timer expires), it is processed, and the system returns a result and/or records the result. Everybody knows how to test the happy path. But every step along the way might fail. For each failure, there is a different response and possible fall-back. Suddenly, the neat interaction branches out into a Hydra and you need to test each head. Here are a few common failures you've probably never tested comprehensively:

  • Input file or directory doesn't exist / has wrong permissions / contains corrupt data
  • Out of memory on any operation
  • Out of disk space
  • Network connection drops out of the blue
  • An external component you call hangs
  • First name string is 30MB (someone is doing your buffer overflow testing for you!)

If you neglect to test how your code behaves in these situations, then you don't know how it will behave at the most critical time. For example, your code might need to alert the operations team when the response time of some server is greater than five seconds. Now, consider the consequences of a bug in this piece of code…

OK, you get the picture. Everything can fail and the error handling-code itself should be tested. But how do you address the seemingly combinatorial explosion of failure code paths? It all starts with proper design. The same old principles of modular, loosely coupled, highly cohesive components will serve you well. Identify the risk of each component and design an error handling policy. For example, for sensitive data, use transactions to ensure you don't end up in inconsistent state. For components that must always be running, build in redundancy and switch-over capability. For large subsystems that expose an API, make sure to control and safeguard the surface area so no crashes or unexpected exceptions escape to the user (it's OK to throw exceptions as part of the design).

Now that your system is a collection of components with well-defined interactions, you can systematically test how the system behaves if any component fails in one of those well-defined ways. It sounds like a lot of work and it is. The only solace is that this approach pushes out really excellent APIs: very small, with minimal interactions, as few side effects as possible, and few externally exposed failures.

Let's examine how to test for errors in the sample MindReader application. My goal is to test how the Presenter handles exceptions. This is very easy to do with the mock mind reader. The test passes null references to progress reports and the thought. This will cause the mock mind reader to throw a null reference exception in the Read() method:

        public void MindReader_CrashTest()
            var mockMindReader = new MockMindReader(null, null);
            var presenter = new Presenter(_mockWindow, mockMindReader);


The question was, "How does the Presenter handle a MindReader exception?" and the answer is that it doesn't. It will crash and burn. The test exposed the fragility of the Presenter and now you can come up with a policy (catch the exception and display a message to the user, retry a couple of times quietly, or log the exception and exit). If there is some error-handling mechanism in place, the test can verify that it is indeed invoked properly.

At the tactical level, I recommend using exceptions and letting them propagate to a point where they can be handled properly. Make sure that the state of the system stays intact in the face of an error. Utilize design patterns (such as RAII) and clean up after yourself. If you use a language like C or Go, where there are no exceptions, you just have to be extra diligent.


Today's software is growing more complex, but with rare exceptions, it is not tested properly. Even software development processes that emphasize testing usually have only limited unit tests. This is often a deliberate cost-benefit decision due to the enormous challenges of deep testing complex systems. The trade-off is time to market vs. quality. In this article, I demonstrated how to deeply test the most challenging aspects of complex systems by relying on tried and true design principles. By using factories, interfaces, and events, any software component or sub-system can be isolated and tested using mocked dependencies.

Gigi Sayfan specializes in cross-platform object-oriented programming in C/C++/ C#/Python/Java with emphasis on large-scale distributed systems, and is a long-time contributor to Dr. Dobb's.

Related Article

Testing Complex C++ Systems

Testing Python and C# Code

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.