Principles of Data-Centric Design
Data-centric design recognizes that the essential invariant is the information exchange between systems or components. It describes the exchange in terms of a "data model" and data producers and consumers of the data; and it relies on four basic principles:
- Expose the data and metadata. Data-centric design exposes the data and metadata as first-class citizens, and uses them as the primary means of interconnecting heterogeneous systems. "Data" is the primary means of describing the "world as it is," independent of any component-specific behavior. Metadata refers to information about the data's layout and structure. A data-centric interface is defined by the metadata, which must contain all of the information required to encode and decode the data in a given format.
- Hide the behavior. Data-centric design hides any behavior and direct references to operations or code of the component interfaces. A component interface cannot embed any component-specific state or behavior. Components implement behaviors that can change the data or respond to changes in data (the "world model").
- Delegate data-handling to a data bus. Separation of data handling and application logic is necessary for loosely coupled systems. The component application logic should focus on manipulating the interface data, not on its management and distribution. The responsibility of data handling is delegated to a data bus; and it is the authoritative source of the world model shared amongst the components.
- Explicitly define data-handling contracts. Data handling contracts should be explicitly specified by the application at design time, and enforced by the data bus at runtime. The delivery contracts specify the QoS attributes on the data produced and consumed by a component, including timing, reliability, durability, etc. The data bus examines these "contracts," and if compatible, establishes data flows. The data bus then enforces QoS contracts, thereby providing the application code clear, known expectations.
Traditional Messaging Designs
By way of contrast, traditional messaging designs focus on functional or operational interfaces, and overlook impedance issues. Because the interface QoS and timing are not modeled, all the interface state and communications issues are implicitly assumed. Figure 1 illustrates the result: a brittle, tightly coupled design. Adding new components or interactions violates the assumptions, forcing system designers to rework the interfaces. The architecture becomes very hard to maintain and evolve with time, because n2 data paths must be explicitly managed.
A data-centric interface specifies the common, logically shared data model produced and/or consumed by a component, along with the associated QoS requirements.
As indicated in Figure 2, a component can be seen as plugging into a software data bus via the data-centric interface that defines data inputs and outputs. When multiple components are present, the result is an information-driven data-centric architecture in which data updates drive the interactions between loosely coupled components.
A data-centric architecture reduces the integration problem to n data paths as shown in Figure 3, since a component must simply integrate only with the common data model that is intrinsic to the problem domain. Components implement data-centric interfaces that declare what they produce or consume. The QoS contracts ensure that timing, reliability, and other requirements are met for any component, new or old. Thus, the system can grow and evolve without change.
The Data Bus
From a component programmer's perspective, the application code simply consumes and produces logically shared input and output variables on the data bus. The responsibility for data routing, delivery, and managing QoS can be decoupled from the application logic and delegated to the implementation of the data bus.
The data bus requirements are fulfilled naturally by software that conforms with the DDS specification. The document defines the data-centric, publish-subscribe communication model for building distributed systems as shown here.
Several implementations of the DDS standard are available today, including an open-source implementation and several commercial versions from RTI, Gallium, and Miltech, among others. Leading DDS implementations provide deterministic low-latency, high-throughput messaging and data caching. While the most natural fit for these products has been in industrial, avionic, and military applications, they have long been used in the financial services industries, where the rapid distribution and processing of data is a critical requirement. And increasingly, as enterprise face the task of handling large volumes of data, these products are entering into business IT organizations.
One of the principal benefits for businesses is that a data-centric architecture paves the way for the use of generic infrastructure components These include databases, complex event processing (CEP) modules, web services, and gateways to messaging services and legacy systems. These components plug directly into the bus without the need for extensive custom coding to integrate them into the computing infrastructure. Done right, this model makes it possible for a spreadsheet to automatically populate cells from data items it subscribes to from the larger data fabric.
Data-centric architecture is a paradigm for creating loosely coupled information-driven systems. It emphasizes a common underlying semantic data and builds distributed applications from independently developed and maintained components. Because there is no direct coupling among the application component interfaces, components in the DDS model can be added and removed in a modular and scalable manner, without the large jump in complexity as producers and consumers of data are added to the architecture. As data volume expands, the simplicity of this architecture is likely to become a crucial part of a business's ability to keep up.
A summary overview of DDS.
Rajive Joshi and Gerardo Pardo-Castellote. OMG's Data Distribution Service Standard: An overview for real-time systems. Dr. Dobb's Journal, November 2006.
— Rajive Joshi, Ph.D., has been working in the area of high-performance real-time distributed systems for more than 18 years. He has been instrumental in developing distributed messaging and data distribution caching infrastructure implementations, including DDS.