Scott is a scientific advisor at Schlumberger's Austin System Center in Austin, Texas, where he was the chief software architect of Schlumberger's new family of wellsite data acquisition systems. He has a Ph.D. in probability and statistics from Michigan State University and he has been programming since 1957. He can be reached through Internet: @guthery asc.slb. com.
A large number of sweeping claims, particularly with respect to programmer productivity and code reuse, are being made for object-oriented programming (OOP), many of them by firms that are in the business of making and selling object-oriented programming tools. Before we bet a real programming project on this freshly sewn technology, we'd like to ask some questions about object-oriented programming just to make sure the emperor's new clothes are really as fine as these tailors claim.
In scientific and engineering disciplines -- besides computer science and programming -- we accumulate experience, gather evidence, and conduct experiments, which we then generalize to make claims. If one doubts the claims one is invited to reanalyze the data and replicate the experiments. For some strange reason, programmers have a history of accepting claims blindly without asking for proof. Hope springs eternal, I guess, and dire need grasps any straw. To my knowledge there has been absolutely no evidence gathered or experiments performed to validate the claims made for object-oriented programming, particularly for object-oriented programming in the large.
The biggest OOP projects undertaken to date seem to be the OOP development systems, and the news from this front is not all good [Harrison, 1989]. But OOP isn't supposed to be an end in itself. Esperanto was wonderful for writing books about Esperanto but not much else. If you want to write an OOP system, then OOP is probably just the thing. But if you want to write, say, an accounting system or a reservation system using OOP you're going where no man or woman has gone before. In fact, you'll be performing on yourself the very experiments that the OOP peddlers should have performed to substantiate their claims. Would you accept this situation if OOP were, for example, a new surgical procedure?
The atomic element of object-oriented programming is, not surprisingly, the object. But what is an object? The 51 papers in the IEEE "Tutorial on Object-Oriented Computing" by Gerald Peterson [Peterson, 1987] contain many definitions and descriptions of an object. These definitions come in two basic flavors. One flavor talks about modeling reality and the other talks about encapsulated collections of programming tricks.
Stripped of its fancy jargon, an object is a lexically-scoped subroutine with multiple entry points and persistent state. Object-oriented programming has been around since subroutines were invented in the 1940s. Objects were fully supported in the early programming languages AED-0, Algol, and Fortran II. OOP was, however, regarded as bad programming style by Fortran aficionados. As Admiral Grace Hopper has so often observed, we don't actually do anything new in computing we just rename the old stuff. Admiral Hopper (at the time Lt. Hopper) was doing object-oriented programming on the Harvard Mark I in 1944 and probably didn't even know it.
Unfortunately, we have completely ignored Rentsch's 1982 plea [Rentsch, 1982]: "Let us hope that we have learned our lesson from structured programming and find out what the term means before we start using it." C++ was first described in April, 1980 [Stroustrup, 1980]. Over nine years later an incompatible Version 2.0 has just been released. The definition of C++ still isn't complete. Can anything that's this hard to define be good for writing understandable and maintainable programs? And if you think it's hard to pin down the definition of an object, just try drawing a bead on the definition of the inheritance links that connect objects.
One of the primary claims of object-oriented programming is that it facilitates the reuse of code. Does it? And if so, at what cost?
The unit of reuse in object-oriented programming is the hierarchy. It is the hierarchy and not the object that is the indivisible whole. Unlike a subroutine library where you can just take what you need and no more, in object-oriented programming you get the whole gorilla even if you just wanted the banana.
The problem is that hierarchies are nonmodular. You can't just clip the objects you want to reuse out of the hierarchy because you don't know (in fact, aren't supposed to know) how the objects are entangled in the hierarchy. So, the cost of OOP-style code reuse seems to be that you must (re)use a lot more code than you want or need. Your system will be bigger, run slower, and cost more to maintain than with subroutine-style reuse. Though there may be situations in which the convenience of the programmer so completely outweighs the interests of the users and the interests of the maintainers, I've never seen one.
If object hierarchies need to be small to control the cost of their reuse then you must be able to get many of them to work together when you build your program. You may, for instance, want to use a polynomial approximation hierarchy, a linked list hierarchy, a communication hierarchy, an indexed record hierarchy, a pop-up menu hierarchy, and a ray-tracing hierarchy all at once.
But how do you combine object hierarchies? Can objects in a C++ mathematics hierarchy send arguments to objects in an Objective-C ray-tracing hierarchy? Sadly, no. What's worse is that you can't even send arguments from one C++ hierarchy to another. There are neither in theory nor in practice any OOP hierarchy combiners.
It is left as an exercise for the OOP programmer to "impedance match" not only between OOP technologies but between OOP hierarchies within a technology. This means doing exactly what you were told you wouldn't have to do; open up the objects and program with respect to representation of the state inside. The object-oriented programmer must map from one internal representation to another. There is, after all, no reason to suspect that one hierarchy's internal representation of a compound object such as a matrix or a picture is anything like another's. This clearly defeats one of the main advertised benefits of object-oriented programming: Namely, hidden internal representation. What you may have saved by not having to write code for objects in the same hierarchy, you now must spend as you write code to map between objects in different hierarchies.
One of the few things that we have learned (again and again) over the last 40 years of programming is that the hard part isn't getting code fragments to work. The hard part is getting them to work together. The name of the game, particularly when it comes to code reuse, is integration at scale. Object-oriented programming makes building code fragments easier, but it makes integration much more difficult. Making the easy parts easier but the hard parts harder is not progress by my lights.
Has any program you've ever written been too fast or even fast enough? What do you do if your object-oriented program isn't fast enough? How do you performance tune an object-oriented program? Indeed, how do you even answer the question, "Where is the program spending its time?"
It's just possible you'll find yourself spending lots of time in one or two of your own methods and can work on making those methods faster using classic techniques. But it's much more likely that you'll find you're spending more time than you care to running around the hierarchy.
There is only one thing you can do: Rearchitect and reorganize the hierarchy itself to make it more efficient and to take into account the way you want to use it. The semantics of the hierarchy thus become a twisted combination of the descriptive reality that the objects came from and the profile of the use your procedural code makes of them. This is not an attractive prospect.
The problem here, of course, is that while the semantics of classic programming languages match the semantics of the underlying hardware, the semantics of object-oriented languages do not. When using classic languages like C or Fortran, if you couldn't bind the problem to the hardware tightly enough to get the performance you needed, you could make this binding tighter yet by resorting to assembly language or even microcode. You can't do this with an object-oriented program because you can't get at the virtual machine that implements the semantics of these languages. They're all hidden away from you in the vendor's compiler and runtime library.
Once again, the programmer is being invited to pass the cost of expedience onto the user of the system. The additional cost of supporting a runtime OOP virtual machine can vary from as little as 50 percent [Thomas, 1989] to as much as 500 percent of the cost of a non-OOP version of the system. This wholesale sacrificing of runtime efficiency to programmer's convenience, this emphasis on the ease with which code is generated to the exclusion of the quality, usability, and maintainability of that code, is not found in any production programming environment with which I am familiar.
Finally, before we leave the topic of hardware, let's not forget the Intel 432. The 432 was OOP in silicon and it failed because it was just too slow. If we couldn't make OOP efficient when we implemented it in hardware why do we think we can make it efficient when we emulate it in software?
Real programs are built by large programming teams, not by individuals or small, closely knit cliques. Because we certainly don't want to imagine that every programmer on a project builds his or her own private object hierarchy, we are faced with the prospect of many programmers working on the same tree. Given something as flexible as an object to work with, it is almost certain that each programmer working on the tree will want to implement a different vision of the reality that the tree is attempting to capture.
One possibility is to appoint an object "czar," the direct analogy of a database administrator. Databases need to be stable, so appointing an administrator to watch over the database schema and carefully coordinate changes to it makes good software engineering sense. Object hierarchies, on the other hand, are deliberately not stable; the hierarchy is the program after all and it's the program that we're developing. Imagine having to ask the permission of the subroutine czar every time you wanted to write a subroutine.
What really happens? What I've seen in three large (7000+ objects) OOP projects is that because everyone is trying to get his or her job done with a minimal number of dependencies on everyone else, subtrees and subrealities spring up all over the place and new objects and new methods sprout like weeds. There was an object in one of these systems which when printed went on for 80 pages. One also finds lots of little private languages for communicating between these subrealities.
Of course, good communication between the team members can attenuate the growth of some of this gratuitous complexity. But, in projects on tight schedules with programmers removed from one another in time, space, and organization, predicating success on good communication adds more risk to an already risky undertaking.
Another distressing property of these multi-programmer hierarchies is that they're difficult to debug. If there is one overarching flaw in OOP, it's debugging. As was noted recently in the OOP newsgroup on USENET, "It has been discovered that C++ provides a remarkable facility for concealing the trivial details of a program -- such as where the bugs are."
While we're passing through this analogy with database management systems, recall that one of the raison d'etre for DBMSs was the separation of data and program. Now, along comes OOP and we're told that mixing data and program is the right thing to do after all. Were we wrong then or are we wrong now?
We have gotten used to mixing languages in our programs. This is industrial-strength code reuse in action; if you can't access it at will, you can't reuse it. You don't have to rewrite a Fortran subroutine into C to use it in a C program, you just call it. Common or at least coercible calling conventions and a uniform linking model have made this possible. One of the many reasons that Lisp has failed as a programming language is that Lisp is a language loner.
How about object-oriented languages? Can you mix Objective-C, Eiffel, CLOS, Actor, Owl, and C++ objects in a tree? Not on your tintype. Object hierarchies are isolated bunker realities just like the language technologies that implement them.
We have learned that there is no all-singing, all-dancing anything in computing. No one language, no one communication protocol, no one operating system, no one graphics package -- no one anything is always right all the time everywhere. We have learned again and again that closed systems are losers. Successful systems have one thing in common -- they can coexist peacefully and gracefully with other systems. Object-oriented programming does not currently have this property, either in concept or in practice. If by "reuse" OOP advocates really mean "reuse when the whole world is just like me" then this is not reuse in any practical or useful sense.
Persistent state means that data obtained from an object cannot be used independently of that object. It means that the very act of obtaining a value invalidates all other previously obtained values. Programmatically, this means that every time you want to use a value you have to retrieve it from the hierarchy. It is a programming error to make a local copy of a value. Hierarchy chasing and the inheritance machinery are not only in the inner loop of every orthodox object-oriented program, they are part and parcel of every use of every value in the program.
But, persistent state isn't only a performance issue. It is much more importantly a data consistency issue. The only correct way to get two or more consistent values from an object hierarchy is to get them together in one package in one response to one query. This is the only way you can be assured that they are consistent each with the other. Not only have I not seen any discussion of this property of OOP, I have seen example object-oriented programs that don't understand the consequences of persistent state and simply assume consistency between values obtained serially. Without explicit assurances from the designers of the hierarchy in use, this is an error.
There was a very good reason why persistent state was regarded as bad Fortran programming style -- it's a semantic mine field. Why do we have to completely rediscover the principles of good programming with each new programming language and paradigm we invent? In all but very restricted and tightly controlled situations experience has shown that persistent state should be avoided. For those who haven't taken a stroll in this mine field, object-oriented programming offers the opportunity to avoid reusing other's experience and learn for themselves. As with modularity and the separation of programs and data, OOP seems content to simply ignore what we have learned in 40 years of programming.
A common failing of many programming aids such as OOP is that you can't get rid of them when you're done with them. They're like training wheels on a bicycle except you can't take them off when you've learned to ride. Programming languages such as Lisp and methodologies such as OOP are particularly painful because they are based on a virtual machine that sits between you and the real machine. The virtual machine is a nice warm-fuzzy to have during development but we simply can't afford to have it in our production systems.
Structured programming was such a success because you got all the benefits of enhanced software productivity without any runtime penalty. We don't know yet what the minimal runtime cost of OOP is but our inability to measure it and hence engineer it should certainly give us pause. I'm uncomfortable working with a programming paradigm whose runtime cost I can't even estimate, let alone eliminate.
Object-oriented programming runs counter to much prevailing programming practice and experience: In allocating and controlling software costs, in modularity, in persistent state, in reuse, in interoperability, and in the separation of data and program. Running counter to prevailing wisdom does not, of course, automatically make an innovation suspect but neither does it automatically recommend it. To date, in my opinion, advocates of object-oriented programming have not provided us with either the qualitative arguments or the quantitative evidence we need to discard the lessons painfully learned during the first 40 years of programming.
Harrison, William H., John J. Shilling, and Peter F. Sweeney, "Good News, Bad News: Experience Building a Software Development Environment Using the Object-Oriented Paradigm," IBM Technical Report RC 14493, March 3, 1989.
Petersen, Gerald E., "Tutorial: Object-Oriented Computing," Computer Society Press of the IEEE, Order Number 821 and 822, 1987.
Rentsch, Tim, "Object-Oriented Programming," SIGPLAN Notices, Volume 17, Number 9, (September 1982) pp. 51-57.
Stroustrup, Bjarne, "Classes: An Abstract Data Type Facility for the C Language," Bell Laboratories Computing Science Technical Report No. 84, April 3, 1980.
Thomas, Dave, "The Time/Space Requirements of Object-Oriented Programs," Journal of Object Oriented Programming, March/April 1989, pp. 71-73.
Copyright © 1989, Dr. Dobb's Journal