The differences between the emerging component-based development and long-standing object-oriented development are often unclear. Find out how separate these concepts really are
What is the rationale behind component software? Or rather, what is it that components should be? Traditionally, closed solutions with proprietary interfaces addressed most customers needs. Heavyweights such as operating systems and database engines are among the few examples of components that did reach high levels of maturity. Large software systems manufacturers often configure delivered solutions by combining modules in a client-specific way. However, the interfaces between such modules tend to be proprietary, at most open to highly specialized independent software vendors (ISVs) that specifically produce further modules for such systems. In many cases, these modules are fused together during a linking step and are no longer distinguishable in deployed solutions.
Attempts to create low-level connection standards or wiring standards are either product or standard-driven. The Microsoft standards, resting on COM, have always been product-driven and are thus incremental, evolutionary, and, to a degree, legacy-laden by nature.
Standard-driven approaches usually originate in industry consortia. The prime example here is the effort of the Object Management Group (OMG). However, the OMG hasnt contributed much in the component world and is now falling back on JavaSofts Enterprise JavaBeans standards for components, although its attempting a CORBA Beans generalization. The EJB standard still has a long way to go; so far it is not implementation language-neutral, and bridging standards to Java external services and components are only emerging.
At first, it might surprise you that component software is largely pushed by desktop- and Internet-based solutions. On second thought, this should not surprise you at all. Component software is a complex technology to masterand viable, component-based solutions will only evolve if the benefits are clear. Traditional enterprise computing has many benefits, but these benefits all depend on enterprises willing to evolve substantially.
In the desktop and Internet worlds, the situation is different. Centralized control over what information is processed when and where is not an option in these worlds. Instead, content (such as web pages or documents) arrives at a users machine and needs to be processed there and then. With a rapidly exploding variety of content typesand open coding standards such as XMLmonolithic applications have long reached their limits. Beyond the flexibility of component software is its capability to dynamically grow to address changing needs.
What a Component Is and Is Not
The separate existence and mobility of components, as witnessed by Java applets or ActiveX components, can make components look similar to objects. People often use the words component and object interchangeably. In addition, they use constructions such as component object. Objects are said to be instances of classes or clones of prototype objects. Objects and components both make their services available through interfaces. Language designers add further irritation by discussing namespaces, modules, packages, and so on. I will try to unfold, explain, and justify these terms. Next, Ill browse the key terms with brief explanations, relating them to each other. Based on this, Ill then look at a refined component definition. Finally, Ill shed some light on the fine line between component-based programming and component assembly.
Terms and Concepts
Components. A components characteristic properties are that it is a unit of independent deployment; a unit of third-party composition; and it has no persistent state.
These properties have several implications. For a component to be independently deployable, it needs to be well-separated from its environment and from other components. A component therefore encapsulates its constituent features. Also, since it is a unit of deployment, you never partially deploy a component.
If a third party needs to compose a component with other components, the component must be self-contained. (A third party is one that you cannot expect to access the construction details of all the components involved.) Also, the component needs to come with clear specifications of what it provides and what it requires. In other words, a component needs to encapsulate its implementation and interact with its environment through well-defined interfaces and platform assumptions only. Its also generally useful to minimize hard-wired dependencies in favor of externally configurable providers.
Finally, you cannot distinguish a component without any persistent state from copies of its own. (Exceptions to this rule are attributes not contributing to the components functionality, such as serial numbers used for accounting.) Without state, a component can be loaded into and activated in a particular systembut in any given process, there will be at most one copy of a particular component. So, while it is useful to ask whether a particular component is available or not, it isnt useful to ask about the number of copies of that component. (Note that a component may simultaneously exist in different versions. However, these are not copies of a component, but rather different components related to each other by a versioning scheme.)
In many current approaches, components are heavyweights. For example, a database server could be a component. If there is only one database maintained by this class of server, then it is easy to confuse the instance with the concept. For example, you might see the database server together with the database as a component with persistent state. According to the definition described previously, this instance of the database concept is not a component. Instead, the static database server program is a component, and it supports a single instance: the database object. This separation of the immutable plan from the mutable instances is the key to avoiding massive maintenance problems. If components could be mutable, that is, have state, then no two installations of the same component would have the same properties. The differentiation of components and objects is thus fundamentally about differentiating between static properties that hold for a particular configuration and dynamic properties of any particular computational scenario. Drawing this line carefully is essential to curbing manageability, configurability, and version control problems.
Objects. The notions of instantiation, identity, and encapsulation lead to the notion of objects. In contrast to the properties characterizing components, an objects characteristic properties are that it is a unit of instantiation (it has a unique identity); it has state that can be persistent; and it encapsulates its state and behavior.
Again, several object properties follow directly. Since an object is a unit of instantiation, it cannot be partially instantiated. Since an object has individual state, it also needs a unique identity to identify the object, despite state changes, for its lifetime. Consider the apocryphal story about George Washingtons axe, which had five new handles and four new axe-headsbut was still George Washingtons axe. This is typical of real-life objects: nothing but their abstract identity remains stable over time.
Since objects get instantiated, you need a construction plan that describes the new objects state space, initial state, and behavior before the object can exist. Such a plan may be explicitly available and is then called a class. Alternatively, it may be implicitly available in the form of an object that already exists, that is close to the object to be created, and can be cloned. Youll call such a preexisting object a prototype object.
Whether using classes or prototype objects, the newly instantiated object needs to be set to an initial state. The initial state needs to be a valid state of the constructed object, but it may also depend on parameters specified by the client asking for the new object. The code that is required to control object creation and initialization could be a static procedure, usually called a constructor. Alternatively, it can be an object of its own, usually called an object factory, or factory for short.
Object References and Persistent Objects
The objects identity is usually captured by an object reference. Most programming languages do not explicitly support object references; language-level references hold unique references of objects (usually their addresses in memory), but there is no direct high-level support to manipulate the reference as such. (Languages like C provide low-level address manipulation facilities.) Distinguishing between an objecta triple definition of identity, state, and implementing classand an object reference (just holding the identity) is important when considering persistence. As Ill describe later, almost all so-called persistence schemes just preserve an objects state and class, but not its absolute identity. An exception is CORBA, which defines interoperable object references (IORs) as stable entities (which are really objects). Storing an IOR makes the pure object identity persist.
Components and Objects
Typically, a component comes to life through objects and therefore would normally contain one or more classes or immutable prototype objects. In addition, it might contain a set of immutable objects that capture default initial state and other component resources. However, there is no need for a component to contain only classes or any classes at all. A component could contain traditional procedures and even have global (static) variables; or it may be realized in its entirety using a functional programming approach, an assembly language, or any other approach. Objects created in a component, or references to such objects, can become visible to the components clients, usually other components. If only objects become visible to clients, there is no way to tell whether or not a component is purely object-oriented inside.
A component may contain multiple classes, but a class is necessarily confined to a single component; partial deployment of a class wouldnt normally make sense. Just as classes can depend on other classes (inheritance), components can depend on other components (import). The superclasses of a class do not necessarily need to reside in the same component as the class. Where a class has a superclass in another component, the inheritance relation crosses component boundaries. Whether or not inheritance across components is a good thing is the focus of heated debate. The theoretical reasoning behind this clash is interesting and close to the essence of component orientation, but its beyond the scope of this article.
Components are rather close to modules, as introduced by modular languages in the early 1980s. The most popular modular languages are Modula-2 and Ada. In Ada, modules are called packages, but the concepts are almost identical. An important hallmark of modular approaches is the support of separate compilation, including the ability to properly type-check across module boundaries.
With the introduction of the Eiffel language, the claim was that a class is a better module. This seemed justified based on the early ideas that modules would each implement one abstract data type (ADT). After all, you can look at a class as implementing an ADT, with the additional properties of inheritance and polymorphism. However, modules can be used, and always have been used, to package multiple entities, such as ADTs or indeed classes, into one unit. Also, modules do not have a concept of instantiation, while classes do. (In module-less languages, this leads to the construction of static classes that essentially serve as simple modules.)
Recent language designs, such as Oberon, Modula-3, and Component Pascal, keep the modules and classes separate. (In Java, a package is somewhat weaker than a module and mostly serves namespace control purposes.) Also, a module can contain multiple classes. Where classes inherit from each other, they can do so across module boundaries. You can see modules as minimal components. Even modules that do not contain any classes can function as components.
Nevertheless, module concepts dont normally support one aspect of full-fledged components. There are no persistent immutable resources that come with a module, beyond what has been hardwired as constants in the code. Resources parameterize a component. Replacing these resources lets you version a component without needing to recompile; localization is an example. Modification of resources may look like a form of a mutable component state. Since components are not supposed to modify their own resources (or their code!), this distinction remains useful: resources fall into the same category as the compiled code that forms part of a component.
Component technology unavoidably leads to modular solutions. The software engineering benefits can thus justify initial investment into component technology, even if you dont foresee component markets.
It is possible to go beyond the technical level of reducing components to better modules. To do so, it is helpful to define components differently.
Component: A Definition
A software component is a unit of composition with contractually specified interfaces and explicit context dependencies only. A software component can be deployed independently and is subject to composition by third parties. (Workshop on Component-Oriented Programming, ECOOP, 1996.)
This definition covers the characteristic properties of components Ive discussed. It covers technical aspects such as independence, contractual interfaces, and composition, and also market-related aspects such as third parties and deployment. It is the unique property of components, not only of software components, to combine technical and market aspects. A purely technical interpretation of this view maps this component concept back to that of modules, as illustrated in the following definition: A component is a set of simultaneously deployed atomic components. An atomic component is a module plus a set of resources.
This distinction of components and atomic components caters to the fact that most atomic components are not deployed individually, although they could be. Instead, atomic components normally belong to a set of components, and a typical deployment will cover the entire set.
Atomic components are the elementary units of deployment, versioning and replacement; although its not usually done, individual deployment is possible. A module is thus an atomic component with no separate resources. (Java packages are not modules, but the atomic units of deployment in Java are class files. A single package is compiled into many class filesone per class.)
A module is a set of classes and possibly non-object-oriented constructs, such as procedures or functions. Modules may statically require the presence of other modules in order to work. Hence, you can only deploy a module if all the modules it depends on are available. The dependency graph must be acyclic or else a group of modules in a cyclic dependency relation would always require simultaneous deployment, violating the defining property of modules.
A resource is a frozen collection of typed items. The resource concept could include code resources to subsume modules. The point here is that there are resources besides the ones generated by a compiler compiling a module or package. In a pure object approach, resources are serialized immutable objects. Theyre immutable because components have no persistent identity. You cannot distinguish between duplicates.
A components interfaces define its access points. These points let a components clients, usually components themselves, access the components services. Normally, a component has multiple interfaces corresponding to different access points. Each access point may provide a different service, catering to different client needs. Its important to emphasize the interface specifications contractual nature. Since the component and its clients are developed in mutual ignorance, the standardized contract must form a common ground for successful interaction.
What nontechnical aspects do contractual interfaces need to obey to be successful? First, keep the economy of scale in mind. Some of a components services may be less popular than others, but if none are popular and the particular combination of offered services is not either, the component has no market. In such a case, the overhead cost of casting a particular solution into a component form may not be justified.
Notice, however, that individual adaptations of component systems can lead to developing components that have no market. In this situation, component system extensions should build on what the system provides, and the easiest way of achieving this may be to develop the extension in component form. In this case, the economic argument applies indirectly: while the extending component itself is not viable, the resulting combination with the extended component system is.
Second, you must avoid undue market fragmentation, as it threatens the viability of components. You must also minimize redundant introductions of similar interfaces. In a market economy, such a minimization is usually the result of either early standardization efforts in a market segment or the result of fierce eliminating competition. In the former case, the danger is suboptimality due to committee design, in the latter case it is suboptimality due to the nontechnical nature of market forces.
Third, to maximize the reach of an interface specification, and of components implementing this interface, you need common media to publicize and advertise interfaces and components. If nothing else, this requires a small number of widely accepted unique naming schemes. Just as ISBN (International Standard Book Number) is a worldwide and unique naming scheme to identify any published book, developers need a similar scheme to refer abstractly to interfaces by name. Like an ISBN, a component identifier is not required to carry any meaning. An ISBN consists of a country code, a publisher code, a publisher-assigned serial number, and a checking digit. While it reveals the books publisher, it does not code the books contents. The book title may hint at the meaning, but its not guaranteed to be unique.
Explicit Context Dependencies
Besides specifying provided interfaces, the previous definition of components also requires components to specify their needs. That is, the definition requires specification of what the deployment environment will need to provide, so that the components can function. These needs are called context dependencies, referring to the context of composition and deployment. If there were only one software component world, it would suffice to enumerate required interfaces of other components to specify all context dependencies. For example, a mail-merge component would specify that it needs a file system interface. Note that with todays components, even this list of required interfaces is not normally available. The emphasis is usually just on provided interfaces.
In reality, several component worlds coexist, compete, and conflict with each other. At least three major worlds are now emerging, based on OMGs CORBA, Suns Java, and Microsofts COM. In addition, component worlds are fragmented by the various computing and networking platforms. This is not likely to change soon. Just as the market has so far tolerated a surprising multitude of operating systems, there will be room for multiple component worlds. Where multiple worlds share markets, a components context dependencies specification must include its required interfaces and the component world (or worlds) for which it has been prepared.
There will, of course, also be secondary markets for cross-component-world integration. In analogy, consider the thriving market for power-plug adapters for electrical devices. Thus, bridging solutions, such as the OMGs COM and CORBA Interworking standard, mitigate chasms.
Obviously, a component is most useful if it offers the right set of interfaces and has no restricting context dependencies; that is, if it can perform in all component worlds and requires no interface beyond those whose availability is guaranteed by the different component worlds. However, few components, if any, would be able to perform under such weak environmental guarantees. Technically, a component could come with all required software bundled in, but that would clearly defeat the purpose of using components in the first place. Note that part of the environmental requirements is the machine on which the component can execute. In the case of a virtual machine, such as the Java Virtual Machine, this is a straightforward part of the component world specification. On native code platforms, a mechanism such as Apples fat binaries (which pack multiple binaries into one file), would still allow a component to run everywhere.
Instead of constructing a self-sufficient component with everything built in, a component designer may opt for maximal reuse. Although maximizing reuse has many advantages, it has one substantial disadvantage: the explosion of context dependencies. If designs of components were, after release, frozen for all time, and if all deployment environments were the same, this would not pose a problem. However, as components evolve, and different environments provide different configurations and version mixes, it becomes a showstopper to have a large number of context dependencies. To summarize: maximizing reuse minimizes use. In practice, component designers have to strive for a balance.
Component-Based Programming vs. Component Assembly
Component technology is sometimes used as a synonym for visual assembly of pre-fabricated components. Indeed, for relatively simple applications, wiring components is surprisingly productive for example, JavaSofts BeanBox lets a user connect beans visually and displays such connections as pieces of pipework: plumbing instead of programming.
It is useful to take a look behind the scenes. When wiring or plumbing components, the visual assembly tool registers event listeners with event sources. For example, if the assembly of a button and a text field should clear the text field whenever the button is pressed, then the button is the event source of the event button pressed and the text field is listening for this event. While details are of no importance here, it is clear that this assembly process is not primarily about components. The button and the text field are instances, that is, objects not components. (When adding the first object of a kind, an assembly tool may need to locate an appropriate component.)
However, there is a problem with this analysis. If the assembled objects are saved and distributed as a new component, how can this be explained? The key is to realize that it is not the graph of particular assembled objects that is saved. Instead, the saved information suffices to generate a new graph of objects that happens to have the same topology (and, to a degree, the same state) as the originally assembled graph of objects. However, the newly generated graph and the original graph will not share common objects: the object identities are all different.
You should then view the stored graph as persistent state but not as persistent objects. Therefore, what seems to be assembly at the instance rather than the class leveland is fundamentally differentis a matter of convenience. In fact, there is no difference in outcome between this approach of assembling a component out of subcomponents and a traditional programmatic implementation that hard codes the assembly. Visual assembly tools are free to not save object graphs, but to generate code that when executed creates the required objects and establishes their interconnections. The main difference is the degree of flexibility in theory. You can easily modify the saved object graph at run time of the deployed component, while the generated code would be harder to modify. This line is much finer as it may seemthe real question is whether components with self-modifying code are desirable. Usually they are not, since the resulting management problems immediately outweigh the possible advantages of flexibility.
It is interesting that persistent objects, in the precise sense of the term, are only supported in two contexts: object-oriented databases, still restricted to a small niche of the database market, and CORBA-based objects. In these approaches, object identity is preserved when storing objects. However, for the same reason, you cannot use these when you intend to save state and topology but not identity. You would need an expensive deep copy of the saved graph to effectively undo the initial effort of saving the universal identities of the involved objects.
On the other hand, neither of the two primary component approaches, COM and JavaBeans, immediately support persistent objects. Instead, they only emphasize saving the state and topology of a graph of objects. The Java terminology is object serialization. While object graph serialization would be more precise, this is much better than the COM use of the term persistence in a context where object identity is not preserved. Indeed, saving and loading again an object graph using serialization (or COMs persistence mechanisms) is equivalent to a deep copy of the object graph. (Many systems use this equivalence to implement deep copying.)
While it might seem like a major disadvantage of these approaches compared to CORBA, note that persistent identity is a heavyweight concept that you can always add where needed. For example, COM supports a standard mechanism called monikers, objects that resolve to other objects. You can use moniker to carry a stable unique identifier (a surrogate) and the information needed to locate that particular instance. The resulting construct is about as heavyweight as the standard CORBA Object References. Java does not yet offer a standard like COM monikers, but you could add one easily.
Components carry instances that act at run time as prescribed by their generating component. In the simplest case, a component is a class and the carried instances are objects of that class. However, most components (whether COM or JavaBeans) will consist of many classes. A Java Bean is externally represented by a single class and thus is a single kind of object representing all possible instantiations or uses of that component. A COM component is more flexible. It can present itself to clients as an arbitrary collection of objects whose clients only see sets of unrelated interfaces. In JavaBeans or CORBA, multiple interfaces are ultimately merged into one implementing class. This prevents proper handling of important cases such as components that support multiple versions of an interface, where the exact implementation of a particular method shared by all these versions needs to depend on the version of the interface the client is using. The OMGs current CORBA Components proposal promises to fix this problem.
Mobile Components vs. Mobile Objects
Surprisingly, mobile components and objects are just as orthogonal as regular components and objects. As demonstrated by the Java applet and ActiveX approaches, it is useful to merely ship a component to a site and then start from fresh state and context at the receiving end. Likewise, it is possible to have mobile objects in an environment that isnt component-based at all. For example, Modula-3 Network Objects can travel the network, but do not carry their implementation with them. Instead, the environment expects all required code to already be available everywhere. It is also possible to support both mobile objects and mobile components. For example, a mobile agent (a mobile autonomous object) that travels the Internet to gather information should be accompanied by its supporting components. A recent example is Java Aglets (agent applets).
The Ultimate Difference
While components capture the static nature of a software fragment, objects capture its dynamic nature. Simply treating everything as dynamic can eliminate this distinction. However, it is a time-proven principle of software engineering to try and strengthen the static description of systems as much as possible. You can always superimpose dynamics where needed. Modern facilities such as meta-programming and just-in-time compilation simplify this soft treatment of the boundary between static and dynamic. Nevertheless, its advisable to explicitly capture as many static properties of a design or architecture as possible. This is the role of components and architectures that assign components their place. The role of objects is to capture the dynamic nature of the arising systems built out of components. Component objects are objects carried by identified components. Thus, both components and objects together will enable the construction of next-generation software.
Blackbox vs. Whitebox Abstractions and Reuse
Blackbox vs. whitebox abstraction refers to the visibility of an implementation behind its interface. Ideally, a blackboxs clients dont know any details beyond the interface and its specification. For a whitebox, the interface may still enforce encapsulation and limit what clients can do (although implementation inheritance allows for substantial interference). However, the whitebox implementation is available and you can study it to better understand what the box does. (Some authors further distinguish between whiteboxes and glassboxes, where a whitebox lets you manipulate the implementation, and a glassbox merely lets you study the implementation.)
Blackbox reuse refers to reusing an implementation without relying on anything but its interface and specification. For example, typical application programming interfaces (APIs) reveal no implementation details. Building on such an API is thus blackbox reuse of the APIs implementation. In contrast, whitebox reuse refers to using a software fragment, through its interfaces, while relying on the understanding you gained from studying the actual implementation. Most class libraries and application frameworks are delivered in source form and application developers study a class implementation to understand what a subclass can or must do.
There are serious problems with whitebox reuse across components, since whitebox reuse renders it unlikely that the reused software can be replaced by a new release. Such a replacement will likely break some of the reusing clients, as these depend on implementation details that may have changed in the new release.