Robert holds a doctorate in physical chemistry and is a senior software engineer for Silicon Energy. He can be contacted at firstname.lastname@example.org.
I work with real-time systems related to the energy industry. In this context, "real time" implies a resolution of approximately one minutenot everybody's definition of real time, for sure, but a timescale that often puts serious demands on large systems.
Traditionally, systems that monitored energy usage were optimized for a high degree of accuracy at the expense of timeliness. Data typically would be collected once a month; this worked fine as long as all readings were accurate because energy customers viewed energy as a nonnegotiable cost. By cultivating an image of extreme accuracy, utilities were able to assure customers that they were being provided with a fair deal.
Things have changed dramatically. There is now a real-time energy market where the price of electricity can vary by a factor of 100 or even 1000 over the course of a few hours, and electricity users can monitor and adjust their energy usage in response. Numerous factors have added a new urgency to real-time energy monitoring in recent years including deregulation, the California energy crisis of 2001, and the Enron scandal.
But gathering energy data is not always easy. Legacy systems will not disappear any time soon and, to make matters worse, these systems historically used proprietary data-transmission formats so that data could not be shared with competitors. Retrieving data from disparate systems can be tricky and slow, so any system proposing to do so on a large scaleand in real timemust be flexible, robust, and avoid new bottlenecks in its internal implementation.
Silicon Energy, the company I work for, has created Windows-based software to address these problems. Our approach includes a data-collection framework, transactional message queuing for transmitting data across the wire, and front-end tools for viewing/manipulating data in web browsers. Many of the pieces of the Silicon Energy solution were built by hand in C++ because ready-made solutions were not available when our software was created. In the future, we expect to benefit from technologies such as Microsoft's .NET, which should provide increased ease of development, deployment, scalability, and maintenance.
The heart of Silicon Energy's approach involves "gateway applications" that translate data retrieved from foreign systems into a consistent format to be published to the Silicon Energy database. As Figure 1 illustrates, the gateway sits between the foreign system and Silicon Energy's system. Figure 1 shows an example of a foreign system that identifies DataPoints by a combination of name and channel number. The Silicon Energy database identifies its DataPoints with a single integer, which we call an "ID." The gateway's job is to map the foreign system's DataPoints to Silicon Energy DataPoints, then use that information to post data associated with the DataPoints.
As Silicon Energy has grown, our data transmission needs have increased in volume. For instance, we recently had a need for a component for caching small amounts of configuration information for each DataPoint. Instead of implementing the cache the simple way (which would have meant creating a C++ COM component and moving on), we evaluated how .NET solves the problem compared to COM-based development. In this article, I present the comparison process we went through.
The DataPoint Collection Project
One of the strengths of Silicon Energy's gateways is the ability to extract information from diverse types of data collection hardware/software and assemble it into a single database with a coherent structure. This data collection requires mapping between the external data source and our internal representation. The problem we faced was how to perform this mapping quickly and with a far larger number of DataPoints than in the past. Storing the point mappings to disk after retrieving them from the database, then reading them back every time we received new data from the foreign system, would have been too slow.
The fastest solution in this case is to maintain an in-memory cache of point information. Since this could potentially use large amounts of memory, we had to be judicious in limiting the size of the information available for each DataPoint, but we could specify more hardware requirements (RAM) if the DataPoint cache outgrew the available memory. The primary goal was to search for any random DataPoint as fast as possible, and we were willing to pay the price of a longer time assembling the cache if necessary.
Although the actual DataPoint cache we use is more complex than the subset presented here (available electronically; see "Resource Center," page 5), the requirements are the same:
- The cache must be available to existing Visual Basic 6 apps, which we would modify to take advantage of the cache (we did not have the resources to perform a major rewriting of the app).
- Search speed is the primary performance metric. Secondary metrics include the speed with which a cache could be created, the time it took to sort the cache, and the amount of memory used by the cache for a given number of points.
- The cache must maintain the information necessary to map Silicon Energy DataPoint information to a foreign system's. A Silicon Energy DataPoint is uniquely identified by a 32-bit integer (the DataPoint's ID); the foreign system may identify its DataPoints by a name string or in combination with a channel number. (The IDL description of the IDataPoint interface is also available electronically.)
- The cache needs to keep track of the timezone ID in which the DataPoint resides, so that local timestamps could be quickly converted to UTC.
- In building the cache, the VB6 app needs to perform preprocessing and validation of each DataPoint. The most efficient implementation of a cache interface would use an interface that was "chunky" (see Adam Nathan's .NET and COM: The Complete Interoperability Guide, Sams Publishing, 2002)all DataPoints would be added in one method call rather than a separate method call for each. However, that would require rewriting code that has already gone through testing, so we opted to keep the existing code and add each DataPoint in a separate method call.
- Searches of the cache must be capable of returning several DataPoints at once, if there is more than one DataPoint that matches the search criteria. For instance, the application may need to examine all DataPoints with a specific name, regardless of the channel number. All these DataPoints need to be returned at once from the cache.
Since the DataPoint's ID is unique, only one point may have a given ID in the collection and the name is not unique. Together, the name and channel are guaranteed to create a unique point. This ensures that a single DataPoint can be mapped from the foreign system (using a combination of its name/channel number) to its Silicon Energy representation (using its ID).
The IDataPoints Interface
The IDataPoints interface (available electronically) contains the standard COM collection interface methods (Item, for retrieving DataPoint at a given index; Add, for adding new DataPoints; and Count, for retrieving the current count of DataPoints in the collection). The IDataPoints interface also contains a few extra methods for efficient use for its unique requirements:
- Capacity property (read/write). The client sets this property to reserve the proper amount of memory before adding new points to the cache; building the cache proceeds much faster if this is done than otherwise, regardless of which language is used to implement the cache.
- Sort method. Sorts the cache once all DataPoints have been added.
- FindPoints method. Searches the cache for one or more DataPoints that match the given name and, optionally, channel number. If no channel number is given, all points matching the name are returned.
The C++ and C# implementations of the point cache implement the IDataPoints interface so that the VB6 client app can interact with any implementation without coding changes; switching implementations simply means changing the ProgID string that's passed to VB's CreateObject method.
The Test Client
To test each implementation of the DataPoint cache, we needed a realistic test client written in VB to ensure accurate performance and configuration testing. The client's job is to:
1. Build a DataPoint cache with a variety of DataPoints, where DataPoint aliases are unsorted.
2. Sort the cache.
3. Let users select a known DataPoint from the cache.
4. Let users search for one or more point(s) using the point alias alone or in combination with the channel number.
Steps 1, 2, and 4 need to be timed as precisely as possible for accurate results (step 3 is really only provided to guarantee that at least one DataPoint can be found in step 4).
The source code for the client is available electronically. The UI is a simple dialog-based application with radio buttons for selecting one of three server implementations (unmanaged C++, C# with a reference type, and C# with a value type): an edit field for entering the number of DataPoints to put into the cache, an edit field for entering an index to select from the collection, and so on (see Figure 2).
In conducting this test, we found a heavy performance penalty related to COM interoperation. Consequently, we created a client application that does the same job as the VB6 client, except it is written in C#. The C# client appears identical to the VB6 client, and the code (also available electronically) is similar.
C++ and C# Servers
The C++ server is an ATL COM component, doing little more than maintaining an STL vector for holding the DataPoints. I was able to quickly create this server because I had a firm grasp of COM, am familiar with ATL and its template classes, and have been using STL for several years.
The C# server took longer to write, not because the code is more complex (in fact, it is simpler than its C++ counterpart), but because I'm new to C# and the .NET framework. To complete the C# implementation, I had to learn C# and .NET quirks. For instance, I had to discover how to create a COM-callable wrapper (CCW) for a .NET component, needed to explore the classes and interfaces in the .NET framework, and learn about the difference between reference and value types and how they are used. While none of these were difficult, they take time to master.
Although not hard, creating a CCW for a .NET component consists of a few steps (a reference implementation of the DataPoints class is available electronically):
1. Create a project containing the COM interfaces to be implemented. This isn't strictly necessary, but it is a good practice if you are creating your own, new interfaces.
2. Create a Runtime-Callable Wrapper (RCW) for the interfaces DLL. Open the Visual Studio .NET command prompt, and navigate the directory containing your new interfaces DLL. Run the strong-name utility (sn.exe) to create a new .snk file containing a public and private encryption key. Then run tlbimp.exe, specifying the interfaces DLL and the .snk file as input. You then have a file with the same name as your interfaces DLL in the same directory, except the .dll extension is replaced with .tlb.
3. Add a reference to the interfaces type library to your .NET project. (In the VS.NET IDE, right-click References in the Solution Explorer, click Add Reference, then click Browse in the dialog that pops up. Browse to the same directory and select your RCW.)
4. Insert 'using SiEDataPointCollection;' at the top of your C# file, then inherit your class(es) from the desired interfaces. Implement all the methods and properties of the interfaces.
5. Add the System.Runtime.InteropServices.Guid attribute to each class, using guidgen.exe to generate a new guid for each.
6. Go back to the VS.NET command prompt, navigate to the new .NET project's directory, and run sn.exe to create another .snk file. In your project's AssemblyInfo.cs file, change the [assembly: AssemblyKeyFile()] line to point to your .snk file, as in [assembly:AssemblyKeyFile(@"..\..\SiEPointColl_CSharp.snk")].
7. In VS.NET, select Project->Properties. Under Configuration Properties, select Build, then set "Register for COM interop" to True. That automatically (re)registers your project output in the registry so that COM can find it every time you recompile.
8. Compile your .NET project. If all goes well, it should now be registered as a COM component.
The Price of COM Interop: Building the Collection
The first task in using the DataPoints collection is to insert a bunch of DataPoints. To do this, the VB6 client sets the Capacity property to the number of DataPoints to be added, then adds each DataPoint individually. Adding DataPoints involves creating them, setting their properties, and calling the IDataPoints.Add methodmeaning a lot of trips from VB6, into COM, to the .NET run-time environment, and back. Performance differences between building a collection in an unmanaged C++ COM component and building it in a C# component is a measure of the performance cost associated with COM/.NET interoperability.
Figure 3 shows the time it took to build a collection of 1 million DataPoints in C++ and C# components (tested on a 350-MHz Pentium III with 1-GB RAM, running Windows 2000 Server). Results are from running the VB6 client and the C# client. The differences are substantial. While it takes about six minutes to build the collection in C# from a VB6 client, the same task takes about 15 seconds when using the C# clienta factor of 24 speed difference! Building the C++ collection from the VB6 client is intermediate between these results (one minute), but it takes over twice that long to build the C++ collection from the C# client!
While simple for reasonably skilled developers to implement, COM/.NET interoperability can be expensive from a performance perspective. Solutions include implementing your application entirely in either one and avoiding interop altogether, or when that's not possible, using "chunky" interfaces to minimize the number of transitions between managed and unmanaged code.
Another interesting comparison is between the VB6/C++ implementation and the C#/C# implementation. The C#/C# implementation is 100-percent managed code and beats the unmanaged implementation by a wide margin: 15 seconds versus 60 seconds. The slower time for the unmanaged implementation is a little misleadingit results from a heavy dependence on the COM run time and the way VB6 interacts with it but underscores that .NET is far from being inherently slow. This lays to rest arguments that the .NET run time, with its just-in-time compilation and intermediate language code, compares unfavorably with unmanaged applications.
Value Types Versus Reference Types: Sorting the Collection
The second task is to sort the DataPoint collection once it has been built. This involves a single call to the Sort method; all sorting happens internally to the collection. This method, therefore, tests the speed of an implementation more than COM interop.
In C++, variables can be allocated either on the stack or heap. Stack allocations result in a variable accessed directly and is removed from memory automatically when it goes out of scope, while heap allocations return a pointer to the actual variable and the instance isn't removed from memory until the pointer is passed to the delete keyword. In other words, using a variable on the stack looks like Example 1(a) in C++, while on the heap like Example 1(b).
C# talks about value and reference types rather than using heap/stack terminology; however, the heap and stack still exist. Value types are allocated on the stack (when you create them, you create the actual value); reference types are allocated on the heap (creating them results in a reference to the value, much like a pointer in C++). Intrinsic numeric types (int, float, bool, and so on) are value types, as are custom types declared as structs. Any types declared as classes are reference types. But the syntax for instantiating value and reference types is identical, and neither value nor reference types need to be manually removed from memory. (Value types go out of scope as before while reference types are eventually cleaned up by the garbage collector.) That means the decision of where to allocate an instance is made when the type is defined, not when the allocation occurs. In Example 2, for instance, the variable v is a value type while variable r is a reference type. This sounds okay because the decision of where to allocate new instances is made at design time rather than when they are used. You can enforce an allocation strategy that is most suited to the type: Small, frequently used types are best allocated on the stack; while large, complex types are more appropriate for heap allocation. But it still introduces a potential gotcha for .NET developers.
Consider the DataPoint type held in the DataPoints collection: It is small and simple, containing a few properties and no complex methods. It could, therefore, be declared as either a struct (value type; available electronically) or class (reference type; available electronically). All aspects of declaring and using the DataPoint type are (nearly) identical.
What effect does this have on the DataPoints collection performance? To answer this, I created a C# DLL with both value and reference implementations of IDataPoint. I also created two DataPoints collection classes that implemented the IDataPoints interfaceone to maintain each implementation type of strongly typed DataPoints.
The VB6 and C# test clients allow selecting one of these C# implementations, then building and sorting it. Figure 4 shows the sort times for both the C# and C++ implementations (all using the VB6 client). The results are striking: The value implementation is the slowest, unmanaged C++ is 30 percent faster, and the reference implementation blows both out of the water with more than three times the speed of the value implementation.
Why does sorting take over three times as long for the value implementation as for the reference implementation? After all, both are essentially identical except for that single keyword choice of struct versus class on the DataPoint type. The answer lies in boxing and unboxing.
Boxing is Microsoft's term for encapsulating a value type inside a reference type (usually System.Object). Boxing an instance of a value type is performed in C# by typecasting it to a reference type. What happens when a value type is boxed in this fashion? An instance of the reference type is created (on the stack), containing a copy of the original value. That implies a stack allocation, plus the garbage collector needs to keep track of the new reference type. Unboxing a value is performed by typecasting the value back to the original type, thereby retrieving the actual value from within the reference.
Both C# implementations of the DataPoints collection call the static System.Array.Sort method, which requires that the two flavors of DataPoint inherit from System.IComparable. For each pair of array elements to be compared during the sort, one is typecast to IComparable, the IComparable.CompareTo(object) method is calledimplicitly typecasting the second element to object, and typecasting the second element back to its original type within the CompareTo method. In C# code, then, the comparison of two DataPoints at array indexes x and y looks something like this:
int nCompare = ((IComparable)array[x]).CompareTo((object)array[y]);
and the CompareTo method in each DataPoint implementation looks like this (I'm only comparing names, ignoring the channel numbers):
int CompareTo(object oRHS)
return this.m_sName.CompareTo(((Data Point)oRHS).m_sName);
This typecasting of DataPoint to IComparable and object and back again is cheap for the reference type because it only entails changing the type of the pointer. But each typecast of the value type to a reference type requires a stack allocation, copying the value. Likewise, each typecast from reference type back to the value type repeats the process in reverse. The end result is to slow down the value implementation to the point where using a struct keyword carries a huge performance penalty relative to using class.
Does this mean value types should never be used? Absolutely not, but you have to be careful in how you use them. For instance, I could have created my own QuickSort method to sort the value DataPoints. This method would retain the DataPoint type, eliminating any need for boxing/unboxing, therefore eliminating the performance penalty. Building the DataPoints collection is faster with the value type than the reference type when using the C# client (by about 15 percent), so it might be a better solution if minimizing build times is important. However, the choice is not a simple one, despite the extremely small difference in the code required to swap between the value and reference implementations.
Searching for DataPoints
Obviously, there are big differences between the times required for different implementations in building and sorting the collection, but for very different reasons. Neither of these was our primary concern for the DataPoints collection however; that honor belongs to searching for one or more DataPoints.
The VB6 and C# clients measure the time required to search for DataPoints using the FindPoints method. Compared to building or sorting the collection, the speed of searching does not vary as dramatically between the three different implementations. Figure 5 shows the search times for each using both the VB6 and C# clients. For both clients, the C#/Reference implementation is the fastest, overcoming the penalty of COM interop. The speeds of the C++ implementation and the C#/Value implementations are about the same when using the VB6 client, but C++ is significantly slower than C#/Value when using the C# client.
Notwithstanding these differences, search speeds proved to be almost equal for the three implementations. So even though this was our primary criterion for selecting one implementation, other factors such as build time, sort time, memory usage, and deployment details ultimately contributed more to selecting one implementation over the other two.
Speed isn't the only performance factor we needed to consider for our DataPoints collection. The amount of memory used by each implementation was also important, since we could potentially store millions of DataPoints on the gateway computer. Therefore, I used Performance Monitor to measure the number of private bytes used by each application before and after building the collection for each server implementation. Each test was performed immediately after starting the client application (in other words, I always closed the client application after building each collection, then restarted it to build the next collection). The results in Figure 6 are dramatic: For the VB6 client, the C++ and C#/Value implementations used about the same amount of memory, 70-80 MB But the C#/Reference implementation used over 250 MB! The results for the C# client are also interesting: All three implementations used about the same amount of memory, with the C++ implementation using the most.
The increased memory usage in the C#/Reference implementation was enough to eliminate it from our consideration of the best implementation to use. This was despite the faster search speeds than the C++ implementation with the VB6 client. The memory used by the C#/Value implementation, however, is comparable to that of the C++ implementation.
Based on the findings presented here, we chose to stick with the C++ implementation because:
- Search speedour primary considerationwas approximately equal for all implementations.
- Memory usage was approximately equal for the C#/Value implementation, but much greater in the C#/Reference implementation. This eliminated the C#/Reference implementation from our range of possibilities.
- The time required for building the initial collection was less for the C++ implementation than for either C# implementation, at least when using the VB6 client.
- The sort time for the C++ implementation was intermediate between the C#/Reference (fastest) and C#/Value (slowest) implementations. Since we eliminated the C#/Reference implementation because of its increased memory usage, this ruled in favor of the C++ implementation.
- Aside from performance considerations, neither I nor my colleagues at Silicon Energy are experts (yet) in .NET programming. We are learning, just like many other developers, but we were not willing to entrust a mission-critical application to a new and unfamiliar technology.
Had we used a .NET solution, we would have had to ensure that the .NET framework was installed on each of our clients' computers. Although Microsoft is pushing to get the framework installed on all Windows workstations and servers, the addition of another redistributable to our already complex install was a dissuasion against using .NET.
If we had not been constrained to using a VB6 client rather than a C# (or .NET) client, however, we may well have decided to use .NET for the entire solution. In fact, we have other projects that are being implemented in .NET, mostly for server-side components. The choice of implementation is dependent on the individual project and, despite the fact that the findings presented here can help make that decision, you must weigh your own project's unique requirements.