Channels ▼
RSS

Web Development

Data-Centric Architecture: A Model for the Era of Big Data


Principles of Data-Centric Design

Data-centric design recognizes that the essential invariant is the information exchange between systems or components. It describes the exchange in terms of a "data model" and data producers and consumers of the data; and it relies on four basic principles:

  • Expose the data and metadata. Data-centric design exposes the data and metadata as first-class citizens, and uses them as the primary means of interconnecting heterogeneous systems. "Data" is the primary means of describing the "world as it is," independent of any component-specific behavior. Metadata refers to information about the data's layout and structure. A data-centric interface is defined by the metadata, which must contain all of the information required to encode and decode the data in a given format.
  • Hide the behavior. Data-centric design hides any behavior and direct references to operations or code of the component interfaces. A component interface cannot embed any component-specific state or behavior. Components implement behaviors that can change the data or respond to changes in data (the "world model").
  • Delegate data-handling to a data bus. Separation of data handling and application logic is necessary for loosely coupled systems. The component application logic should focus on manipulating the interface data, not on its management and distribution. The responsibility of data handling is delegated to a data bus; and it is the authoritative source of the world model shared amongst the components.
  • Explicitly define data-handling contracts. Data handling contracts should be explicitly specified by the application at design time, and enforced by the data bus at runtime. The delivery contracts specify the QoS attributes on the data produced and consumed by a component, including timing, reliability, durability, etc. The data bus examines these "contracts," and if compatible, establishes data flows. The data bus then enforces QoS contracts, thereby providing the application code clear, known expectations.

Traditional Messaging Designs

By way of contrast, traditional messaging designs focus on functional or operational interfaces, and overlook impedance issues. Because the interface QoS and timing are not modeled, all the interface state and communications issues are implicitly assumed. Figure 1 illustrates the result: a brittle, tightly coupled design. Adding new components or interactions violates the assumptions, forcing system designers to rework the interfaces. The architecture becomes very hard to maintain and evolve with time, because n2 data paths must be explicitly managed.

[Click image to view at full size]
Figure 1: If a distributed design focuses on behavior or method interactions, then it leaves the data exchange implicit. This results in a tightly coupled implementation. When new components are added, then new interactions may change data exchange requirements. Thus, adding a new component forces designers to rework existing components. The resulting "spaghetti" gets worse as requirements progress.

Data-Centric Interfaces

A data-centric interface specifies the common, logically shared data model produced and/or consumed by a component, along with the associated QoS requirements.

As indicated in Figure 2, a component can be seen as plugging into a software data bus via the data-centric interface that defines data inputs and outputs. When multiple components are present, the result is an information-driven data-centric architecture in which data updates drive the interactions between loosely coupled components.

[Click image to view at full size]
Figure 2: Components plug into a data-centric interface and offer/request data. Changes to the data drive the components' interactions.

A data-centric architecture reduces the integration problem to n data paths as shown in Figure 3, since a component must simply integrate only with the common data model that is intrinsic to the problem domain. Components implement data-centric interfaces that declare what they produce or consume. The QoS contracts ensure that timing, reliability, and other requirements are met for any component, new or old. Thus, the system can grow and evolve without change.

[Click image to view at full size]
Figure 3: Components implement data-centric interfaces that declare what they produce or consume. Compare the simplicity of this model with the one shown in Figure 1.

The Data Bus

From a component programmer's perspective, the application code simply consumes and produces logically shared input and output variables on the data bus. The responsibility for data routing, delivery, and managing QoS can be decoupled from the application logic and delegated to the implementation of the data bus.

The data bus requirements are fulfilled naturally by software that conforms with the DDS specification. The document defines the data-centric, publish-subscribe communication model for building distributed systems as shown here.

Several implementations of the DDS standard are available today, including an open-source implementation and several commercial versions from RTI, Gallium, and Miltech, among others. Leading DDS implementations provide deterministic low-latency, high-throughput messaging and data caching. While the most natural fit for these products has been in industrial, avionic, and military applications, they have long been used in the financial services industries, where the rapid distribution and processing of data is a critical requirement. And increasingly, as enterprise face the task of handling large volumes of data, these products are entering into business IT organizations.

One of the principal benefits for businesses is that a data-centric architecture paves the way for the use of generic infrastructure components These include databases, complex event processing (CEP) modules, web services, and gateways to messaging services and legacy systems. These components plug directly into the bus without the need for extensive custom coding to integrate them into the computing infrastructure. Done right, this model makes it possible for a spreadsheet to automatically populate cells from data items it subscribes to from the larger data fabric.

Conclusion

Data-centric architecture is a paradigm for creating loosely coupled information-driven systems. It emphasizes a common underlying semantic data and builds distributed applications from independently developed and maintained components. Because there is no direct coupling among the application component interfaces, components in the DDS model can be added and removed in a modular and scalable manner, without the large jump in complexity as producers and consumers of data are added to the architecture. As data volume expands, the simplicity of this architecture is likely to become a crucial part of a business's ability to keep up.

Additional Resources

A summary overview of DDS.

Rajive Joshi and Gerardo Pardo-Castellote. OMG's Data Distribution Service Standard: An overview for real-time systems. Dr. Dobb's Journal, November 2006.


— Rajive Joshi, Ph.D., has been working in the area of high-performance real-time distributed systems for more than 18 years. He has been instrumental in developing distributed messaging and data distribution caching infrastructure implementations, including DDS.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 
Dr. Dobb's TV