It was a gloomy Thursday morning as I sauntered into my dingy cube and asked Shane, my counterpart in havoc, how the new production release was going.
"Not bad." (He always says that.)
"You mean well be able to go to this weeks status meeting today? Remember the last release when we huddled around Chriss cube for hours trying to fix a bug?"
I was pleased to know that the 700 function points that we had added to the application werent blowing up. And even more pleased about some performance icing we had added to the cakea framework of proxies built off of nil that saved 120 milliseconds at main retrieval time by delaying the instantiation of a complex business object, plus a second group of proxies that prevented mainframe transactions from being fired off if they werent needed.
Boy, was I wrong about the icing! Instead, the first dollop of fresh egg landed on my face at 8:32 a.m.from Colorado, of all places. Mysterious walkback files were machine-gunned into the log directory, all from one userabout 70 files in a matter of seconds. A quick call revealed that the poor soul had just tried to do a plain retrieval and update transaction, the kind she had been doing all morning without a problem. This time, however, her PC spit out a cryptic error message: An undefined object locked when it didnt understand one of our persistence helper objects. After rebooting, everything worked fine, however.
Peering into the last of the 70 walkback logs, I immediately noticed something strange. The log didnt begin at the bottom with the items you would normally see, like:
UIProcess(Process)>>#newProcessOn:stack
Size:withArguments:named:
receiver = UIProcess:(4/9/99 12:14:34 PM){suspended,3}
arg1 = [] in UIProcess class>>#forkUserInterface
arg2 = 1024
arg3 = ()
arg4 = (4/9/99 12:14:34 PM)
Instead, the log mysteriously started where the undefined object didnt understand something. And the offending class and method was, of course, a concrete proxycode that I had written! Whats more, because the beginning of the stack trace never printed, I didnt have a clue as to how that undefined object came into existence. The classs only constructor ensured that the proxy should have been initialized to one of our persistence helper objects. Obviously, the proxy was being instantiated via another mechanism but how? Where?
Clearly, my nil descendants also didnt know how to play with the walkback mechanismevery time you touched them in the wrong way, 70 to 100 files would be spit out, and the machine would lock up.
As more and more of our 4,000 users logged in and started pulling the new production release, the volleys of egg mounted. The next two came from Cleveland sitesusers with very similar symptoms. Then the East Coast started sputtering: one, then two. By 11:30 a.m., we had over 10 incidents of this mysterious failure, and our friends at the help desk were tired of rebooting machines.
Thousands of users were sailing along, merrily doing transactions with no signs of trouble. We had put two types of proxies into production, all inheriting off the same abstract class that lived right under nil, and the second type of proxy was problem-free? Hmm
As it happened, I had a flight home scheduled for that afternoon. Since only 12 users out of thousands had been affected, Chris said I could still go home for the weekend. The captain always stays with the ship while we rats well, the Midwest Express to Milwaukee had plenty of room for rats.
We had built our performance proxies by the bookthat is, two books that, put together, are about as thick as a NASA manual for a Mars flight: Design Patterns: Elements of Reusable Object-Oriented Software, by Erich Gamma, Richard Helm, Ralph Johnson and John Vlissides (Addison-Wesley, 1994) and The Design Patterns Smalltalk Companion, by Sherman R. Alpert, Kyle Brown and Bobby Woolf (Addison-Wesley, 1998). Our walls were adorned with the right Rational Rose UML diagrams! We had gone through design reviews! We pair-programmed those puppies! Wed done code reviews with the core architecture team! We had tested and retested the proxies for weeks prior to the production releaseunit testing, regression testing, automated testingall without so much as a bump!
By the end of the day, 50 of 4,000 customers had suffered machine lock-up and the log file directory was bursting. All through the weekend, I dreaded the flight back to Cleveland. Sunday evening, I went straight to work, slipped into my cube, logged on to Lotus Notes and discovered, buried amidst 100 irrelevant messages, a notice of an emergency production release that had been delivered on Saturday! One of my proxies had been yanked out of production. My heart sank as I imagined those dreaded words from the director of operations: "Unplug and go home." For a moment I thought, "Cubicide! The only honorable way out."
Chris discovered the key clue: The proxies that bombed were part of cloned object graphs and the ones that never bombed were never cloned. I jumped into Rose and threw some quick sequence diagrams together. The cloning framework deep down makes a call to #shallowCopy, a primitive. However, it is a primitive with a difference: There is Smalltalk code below the primitive call to manually crank out a shallow copy if the virtual machine returns a primitive failure. Of course, I had implemented #shallowCopy in our nil proxy abstract class, so all our object graphs could clone themselves happilyas long as the VM wasnt stressed. Now I suspected that, on very rare occasions, the VM was deciding to do a failure return from the primitive call, perhaps because a global garbage collection was just beginning when it got the call.
In three weeks of automated testing, the VMs never once faltered on the #shallowCopy primitive. And the primitive call never failed for hundreds of thousands of transactions on the first day of the production release. But it did fail 50 times in a somewhat random fashion. A quick note dashed off to the OTI (Object Technology Inc.) lab confirmed my suspicion that the #shallowCopy primitive call is designed to fail if the VM gets the call and there is "not enough memory to allocate the new object quicklyA non-quick allocate would be something that required a garbage collection operation." So, the VM punts the responsibility for object creation and copying back to the Smalltalk code, which lurks below the primitive call in #shallowCopy, when it thinks the action will take too long!
But how could I prove this was actually happening, since we couldnt reproduce the error in our automated testing environment? The first step was to comment out the primitive call in #shallowCopy ("<primitive: VMprObjectShallowCopy>"). Subsequent calls to the method would then always fall into the Smalltalk code and, low and behold, we got the same behavior that we saw in the original 50 failures. Rapid-fire walkback logs (50 to 70 separate logs) appeared each time we called #shallowCopy on the nil side.
A quick tour through the object side of the image revealed that EsStackFrame>>#debugPrintOn: liked to call #debugPrintString on whatever was being written to the walkback log. And there was part of the problem: #debugPrintString wasnt implemented in our abstract proxy class. Once we implemented the proxy, each error only received one log, instead of 50 to 70. When I looked at, that one walkback log, I knew why we were failing.
The Smalltalk code that executes after a #shallowCopy primitive failure manually makes a new object via a #basicNew or a #basicNew: to its class and then probes the original objects shape via #instVarAt: and re-creates the same shape on the new object via #instVarAt:put:. We had already implemented #instVarAt: so our proxies could be seen in standard inspectors (its called by: EpInspector>>#selectedValue), but had not needed to implement #instVarAt:put: until now. Once we implemented it in our abstract proxy class, the error never recurredeven when the #shallowCopy primitive call was commented out.
However, our testing manager asked an important question: How can we configure our environment so that it traps this kind of error before a production release? Back in my VisualSmalltalk Enterprise days (what a nice Smalltalk, by Parcplace), I could configure the amount of operating system virtual memory, old space, new space and so on from the command line. And good old VisualAge Smalltalk, by IBM, has some of the same options (see IBM Smalltalk Users Guide Version 4.5). Just constrict some of these measures beyond what is reasonable and, voila, youve got the old VM choking on the #shallowCopy primitive call.
We didnt have to go that far, because, as the fates would have it, our Internet server team had gobbled up the flawed proxy framework as soon as it was released to the production configuration. And they were able to break the framework in their test environment. How? They swamp their servers with a Web version of load runner that puts incredible stress on each VM. It didnt take them long to find me after the walkback logs mushroomed around my code. Happily, I had the whole thing figured out, and when they imported the fixes for #instVarAt:put: and #debugPrintString, everything went fine.
A walk on the nil side sure can be exciting, but if you would rather lead a peaceful life, heed the following:
Start by implementing your nil descendants with the behavior described in Design Patterns or by the Proxy Pattern described in The Design Patterns Smalltalk Companion.
Test whatever you build on the nil side in various browsers and inspectors to make sure you have implemented all the support they require. You might want to put browser support methods in a class extension application that is loaded only during development.
Make sure you can use ObjectSwapper, ObjectLoader and ObjectDumper with your nil subclassed objects and their realSubjects. Include all the SwapperSupport methods needed.
Make sure you can break your nil subclassed objects by sending each one a message it doesnt understand before its realSubject is ever instantiated. That way it will behave properly in the Envy debugger and walkback logs will be generated just once in run time.
Finally, if you like the Visualization tools by IBM and want to make your proxies show up there, youll have to figure out what Object>>#dtxBecomeMonitored and Object>>#dtxBecomeMonitoredFromCritical do. The code for these is hidden in most ENVY/Developer environments (a multiuser IDE by Object Technology International with version control, configuration management and a reusable component library), and youll have to implement it properly on the nil side for your proxies to work with IBMsVisualizer.
A Translation for the Non-Smalltalk Fluent Strong typing is why it takes 53 lines of Java code to deliver just one function point.
In Smalltalk, high-performance proxies are as easy as falling off a log, the log being nil. Java has no simple counterpart to nil, so one has to create ones own root. In this respect, its not trivial to get around java.lang.object. Typically, proxies in Java are extended from the object, so they carry all of the latters baggage, whereas Smalltalk proxies subclassed off of nil carry no state or behavior baggage. Most Java proxies, therefore, do not have the light-weight instantiation advantages of Smalltalk nil proxies. It is precisely this light-weight instantiation overhead that markedly contributes to performance in systems where tens or hundreds of thousands of nil proxies are employed. In Smalltalk, messages routed to the real subject via the proxy are trapped by overriding #doesNotUnderstand:. But Java has no such convenient dynamic mechanism. The latters strong typing makes it very hard to achieve a run-time situation where an undefined message is sent to a proxy and that proxy is really supposed to turn around and delegate that message to its real subject. Reflection can be used to get around some of these issues in Java. However, reflection currently cripples performance and performance is often the reason we build proxies in the first place. Indeed, when Java folks promote strong typing, Smalltalkers are likely to retort: "Strong typing for weak mindsweak typing for strong minds." Strong typing is part of the reason it takes an average of 53 lines of Java code to deliver an International Function Point Users Group (IFPUG) Level 4 function point, while Smalltalk delivers the same function point in only 21 lines (see www.spr.com/library/0langtbl.htm). Customers pay for functionality, measured in standard function points, and typically dont care what language is under the hood. Javas intentional choice of strong typing also makes construction of proxies and their corresponding real subjects ponderous. Here is some Java code from my colleague Dave Harris; it uses an envelope or letter idiom to implement a proxy that stands in for a server, both of which inherit from the same interface which defines the service:
As you can see, this requires a lot of up-front thought. If we need to add a new method to the ConcreteServer, we must also add it in Server and ProxyServer. For a different kind of server we must start all over again. Lots of effort and duplication, compared to Smalltalk. On the other hand, the approach should work and you should be able to build real solutions with it. It does yield what the Java advocates call "the benefits of manifest typing." The boundaries between systemswhere uncertainty and change are greatestare where Java folks feel they most benefit from making interfaces explicit and rigorously enforced. In C++, we can get closer to the dynamic nature of Smalltalk proxies by overloading the member access operator, represented in Java by a period. This allows doing additional work whenever an object is dereferenced and the proxy ends up behaving like a pointer. Methods then do not have to be defined in triplicate (on the interface, the proxy and on the real subject) as in Java. For a good example of C++ proxies, see pp. 213-215 of Gammas Design Patterns. For more information on the basics of Smalltalk proxies, see pp. 213-221 of Alperts The Design Patterns Smalltalk Companion. (One wonders when, if ever, a Design Patterns Java Companion will appear).
But, take heart, all you folks stuck in J-Land. A group of self-sacrificing Smalltalkers is working feverishly on a technology that emits Java byte codes from Smalltalk source and implements features needed to do things like nil proxies. Stay tuned for further news. Enoch Sower |