Design

Building Internet Distributed Computing Applications

By David Houlding, November 01, 2005

Protege is a tool that lets you efficiently map out an Enterprise Architecture to enable knowledge mining for analysis and planning.

November, 2005: Building Internet Distributed Computing Applications

David is an enterprise architect and may be reached at [email protected].

Large-scale, heterogeneous enterprise architectures have many complex dependencies that span business, technology, and organization aspects. Such dependencies and the concepts they relate form the high-level metadata about the enterprise, or "big picture" as it is often called. Complexity in enterprise architectures is compounded by rapidly changing standards, technologies, and skill sets, as well as growth by acquisition and mergers, particularly in long-lived systems that are more than five-years-old. In relatively well-documented enterprise architectures, these dependencies are frequently spread across multiple documents of different versions, formats, and degrees of accuracy. There are also often overlaps in documentation, causing redundancy and inefficiency. Conversely, there are often gaps in documentation with critical information only in the heads of experts, presenting a dangerous situation for the organization given the reality of reassignment and turnover of staff. Without a comprehensive view of these dependencies, organizations are not able to do accurate proactive analysis and planning to enable smooth change execution—whether an upgrade, enhancement, migration, or some other change. Consequently, organizations are often forced into a reactive trial-and-error mode where dependencies are discovered late during the change execution process and must be compensated for, causing cost overruns, delays, and sometimes decreased quality or even failure.

For example, a given mission-critical software application may have software dependents, and may itself depend on multiple other software and hardware components including, for example, an operating system and server machine, as well as other distributed components through various middleware interfaces. The same application may also have organizational dependencies in being owned by a specific system administrator, and may have a specific set of dependent users. Business dependencies for an application include business processes that depend on the application. To enable an organization to accurately plan a smooth change that affects this application, it is critical to understand these business, technology, and organizational dependencies during the analysis and planning phase. To elaborate on our application example consider an upgrade. Executing an upgrade smoothly requires understanding related business processes, users and system administrators, as well as the application's technology dependents and dependencies because such an upgrade may prompt upgrades to these related components as well.

The need to define the organization's Enterprise Architecture (EA) to enable knowledge mining for analysis and planning, and smooth change execution, is clear. Such an EA knowledgebase represents a concise, high-level model of the metadata forming the big picture of the enterprise. However, to deliver its full value in supporting analysis and planning, the model must be comprehensive in terms of spanning the business, technology, and organizational concepts and relationships without overlap, redundancy, and gaps.

Fortunately, there are well-established methods and tools to achieve this. In this article, I use Protégé (http://protege.stanford.edu/) to efficiently map out an EA to enable knowledge mining to support accurate proactive analysis and planning. Protégé includes an ontology editor and knowledgebase framework. The complete code for examples discussed here is available electronically (see "Resource Center," page 4).

Defining the Metamodel

The metamodel is the class model of the knowledgebase. The class types in the metamodel, together with their attributes and relationships, define how to create objects and object graphs in the knowledgebase that represents instance information about the EA. Figure 1 presents the Classes view that shows the metamodel of a simple example EA in Protégé. In this case, the Technology aspects of the EA include a Service-Oriented Architecture (SOA). Consequently, classes in the metamodel include Service, Interface, and Implementation to represent the SOA. To enable a holistic view of the enterprise these components are also cross linked with Business and Organizational aspects of the enterprise; for example, a Service is related to its business Requirements and organizational Owners. The Service class is highlighted in the left panel and its details are shown in the right panel, including Template Slots, which are effectively the attributes of the Service class. Some attributes are basic types (for example, Name is a string), while other attributes have types that are other classes in the metamodel. Interfaces_of_Service, for example, is a collection of zero or more Interface objects in the knowledgebase. For nonbasic types, attributes have bidirectional relationships, as indicated by the "inverse-slot" indicator to the right of each attribute definition. Classes may have zero or more superclasses in the metamodel, enabling either single or multiple inheritance. The Service class, for instance, has Component as its superclass.

The metamodel must be defined to represent the specifics of the enterprise being mapped. In other words, the metamodel represents the enterprise as it is—not as it should be. This approach subsequently enables accurate analysis and planning to form and execute change plans to incrementally evolve the EA to where it should be, tracking changes along the way with updates to the knowledgebase. While it is often possible to find significant commonality across metamodels of enterprise architectures, all enterprises have some uniqueness, and this needs to be reflected in the metamodel.

The information in the metamodel should be concise and high level, representing concepts and relationships. In contrast, you would not generally want to take a detailed, descriptive free-form text document and import it into a knowledgebase. To improve navigability in the knowledge base, it is desirable to use bidirectional relationships wherever possible and appropriate. In Protégé, such relationships are represented as "inverse slots." For example, in an SOA, this enables you to find the owner of a Service, or alternatively for a given Owner what Service(s) they own. Keeping information concise and high level enables efficient enhancement and maintenance of the knowledgebase. Representing concepts with all applicable relationships enables extensive dependency and relationship knowledge mining to assist in analysis and planning.

Building the Model

With a metamodel defined, object instances of the classes in the metamodel may be entered to build the model of the EA knowledgebase. Such instance information may be entered interactively using the Forms and Instances views for which the tabs are visible in Figure 1. The Forms view may be used to define by class type the layout and widgets that should be used for the Instances view of each respective class type. The Instances view presents all the object instances for a given class type. For a selected instance, the attribute values of that instance are displayed with the user interface specified in the Forms view for the given class type. Instance information may also be imported automatically through the Protégé scripting interface. This scripting interface is provided by the Protégé Script Tab that can be downloaded with Protégé and enabled through the project configuration menu. This scripting interface supports five scripting languages, including Python (the default). For each scripting interface, access is provided to the core of the Protégé knowledgebase, enabling create, report, update, delete, and other operations on both the metamodel and model of the EA knowledgebase. In Figure 1, the Service class of the metamodel is highlighted. A similar Python script wrapper class can be defined for this Service class to support convenient use in script-based import, export, and knowledge mining; see Example 1(a). The __init__ method is the constructor of the Service class, and the Service Python class subclasses the Component Python class, mirroring the relationship in the Protégé metamodel. Based on this, a factory script function may be defined as in Example 1(b).

Note the kb Python object that represents the central knowledgebase object accessible to the script. Internally in Protégé, this is the Java edu.stanford.smi.Protégé.model.DefaultKnowledgeBase object and its public methods may be called from the Python/Jython script. Using this approach, you can write simple scripts to create, update, analyze, or export information from the knowledgebase. For example, this script can be used to populate the model:

service = createService( "Security" )
service.setInterfaces( interfaces )
service.setRequirements( requirements )
service.setOwner( owner )

In this example, interfaces and requirements are collections of Python Interface and Requirements wrapper objects, respectively, while owner is an instance of a Python Person wrapper object. The accessors for Requirements and Owner are defined on the Component Python superclass.

In addition to keeping scripts concise, intuitive, and object oriented, a significant benefit of this Python object wrapper layer is that it isolates scripts from the Protégé internal application architecture and the metamodel, enabling both to evolve easily without having to update all scripts. The Protégé scripting interface offers ultimate flexibility and is robust for automated import. There are also various plug-ins available for Protégé that handle specialized automated imports with varying degrees of applicability, flexibility, and maturity. Ultimately, experienced users of Protégé will maximize their use of automated import capability to both build and update the model information. This automation enables Protégé and the EA knowledgebase to be the center of the "information flows" of the organization, where information from many different origins and formats converge through import into the model, and may then be exported in various views, either for other applications (for example, spreadsheets), or for reports to support business deliverables or decisions.

A Practical Approach to Building an EA Knowledgebase

Even though an EA knowledgebase typically contains only high-level, concise metadata about the business, technology, and organization, rarely can organizations afford to import all their EA information before deriving value out of the EA knowledgebase. This is because even for a moderately complex EA, this could take years and most organizations can't wait that long before seeing some return on investment (ROI). A more practical approach to populating the model for the EA knowledgebase is to drive incremental updates with specific, well-defined, short-term business ROI. Using this approach with a particular short-term business need or deliverable in mind, you can define the information and presentation required to support the need, update the metamodel to represent this information, load instance information from multiple sources as required, and complete the cycle with exporting a report—both with the required information and in the required format to support the business need. At the end of each of these incremental iterative cycles, an additional layer of information is added to the EA knowledgebase, building it up layer-by-layer like an onion, and with each step delivering more value to the organization. One way to think of this is incremental iterative growth with each step making sense from a business standpoint in itself, and with no long-term leaps of faith required.

Knowledge Mining

The value in information comes in acting upon it. You can derive value from an EA knowledgebase base by knowledge mining. There are a variety of techniques for knowledge mining with Protégé, including dynamic queries through the Queries tab in Figure 1. The dynamic query capability provided by the Protégé Queries tab excels at queries that retrieve object(s) with direct attributes that have specified values. However, to query for an object based on deeper indirect relationships with other objects, you can use the scripting interface. For example, using the Python wrapper class approach, you could form "queries" like Example 2(a). Running this query in the Protégé script console with the example EA knowledgebase gives the result in Example 2(b).

The output of this query is simple text. You can easily put structure around this (as XML tags, for example) and export to a file for import into other tools and reporting formats to enable further analysis and knowledge mining.

Numerous Protégé plug-ins that support visualization of different views of the metamodel and/or model are also available at http://protege.stanford.edu/.

Automating EA Updates

It is imperative that EA knowledgebases mirror real enterprises as accurately as possible. Therefore, they must be updated frequently to track changes and ensure consistency. This presents a considerable challenge, especially considering:

Even with a concise, high-level representation of the metadata of the enterprise, there is still a vast amount of information to import to build a knowledgebase.
Enterprise architecture information is typically spread across multiple different sources (for example, spreadsheets, schema, and various databases), across various teams, and with varying degrees of accuracy and currency.
Some information may also be only in peoples' heads, and otherwise undocumented.

Therefore, in a practical and successful use of EA, the import of information to build and update the knowledgebase must be highly automated. It also must include validation to detect gaps and inconsistencies.

A working example for automating updates of Protégé EA knowledgebases is available electronically (see the Example/Import subdirectory). In this example, key information about services in an SOA is stored in an Excel spreadsheet called "Services.xls." You can define a schema Services.xsd that defines an XML document type structure, then add this as an XML Source to the Services.xls spreadsheet and save the spreadsheet as the XML document Services.xml. Using the XSLT transform in ServicesToImport.xsl, you can then transform this XML document into the Python script in ImportServices.py that can be executed in Protégé to update the knowledgebase. Example 3 is a snippet of the generated script for updating the Security Service in the SOA.

While Protégé plug-ins are available for updating, validating, and exporting knowledgebases, the scripting approach presented here is the most robust and flexible technique, and the same scripts can be used for update, validation, and export.

Validating the EA

To support accurate analysis and planning, the EA knowledgebase must be as complete, up-to-date, and accurate as possible. Granted, an EA knowledgebase is unlikely to be 100-percent accurate, and is only as good as the information entered into it. However, through the use of validation techniques, you can achieve a level of accuracy far greater than any existing information sources without the EA knowledgebase. Any single information source used to update the knowledgebase is likely to not be perfect, with some gaps and inconsistencies. With simple cross-validation techniques, you can detect gaps and inconsistencies, trigger corrections, and greatly increase the accuracy of the EA knowledgebase. These techniques can be done either during automated imports and updates, or on the EA knowledgebase using validation business rules to enforce known requirements and detect inconsistencies for potential correction.

For example, a list of services can be loaded from the best available source. A second information source with service information can then be loaded using scripts that raise exceptions where services are referenced in the second information source that don't exist in the EA knowledgebase. Such exceptions can be reviewed and corrected manually, either by correcting the information sources and reimporting, or by interactively updating the EA knowledgebase.

Once information is loaded, you can write simple scripts that verify relationships that must hold true. For example, you may assert that each service in the SOA must have an owner for accountability and coordination of changes and use. Example 4 is a script to ensure all services have an owner, and otherwise print the service name to trigger interactive assignment of any potential orphaned services to new owners.

Automating EA Exports

With an EA knowledgebase at the convergence point of information flows in the enterprise, there is need to support export of "views" from the knowledgebase to other tools used for specific tasks. Consider, for example, an export of a slice of information from a knowledgebase to an Excel spreadsheet. This can be done in a manner similar to the automated import approach, only for automated export the steps are reversed. A simple script can mine the knowledgebase and write information to an XML file in a format suitable for import into the target tool. Alternatively, a more generic XML export structure can be used, then transformed using an XSLT stylesheet to a specific target format. The latter XSLT approach is particularly appropriate where a given view of the knowledgebase may be destined for more than one target application and/or report, each of which may require different formats that may be supported by different stylesheets.

Generating EA Reports

Where information from an EA knowledgebase is destined for human consumption, reports may be generated in PDF format. A working example of how to generate such a report from a Protégé EA knowledgebase is available electronically in the Example/Report subdirectory. A script can mine information from the EA knowledgebase. For example, the script BusinessTechnologyTraceabilityReport.py writes information to the XML file BusinessTechnologyTraceabilityReport.xml that can then be transformed using the XSLT stylesheet in BusinessTechnologyTraceabilityReport.xsl into the raw XSL:FO report in BusinessTechnologyTraceabilityReport.fo. This XSL:FO report can generate a polished report in PDF format using the Apache Formatted Objects Processor (FOP) at http://xml.apache.org/fop/; for example, see BusinessTechnologyTraceabilityReport.pdf. Reports in other formats can be generated using the same process with different stylesheets.

Evolving Toward an EA-Centric Organization

In the short term, an EA approach can't stop the acquisition or use of information in the current formats and tools of choice. Rather, EA must fit into the existing process and support current formats and tools for information sources. Similarly, EA must enable export of information to existing tools and reporting formats. In the long run, it may be possible for some information sources to be eliminated and information entered instead directly into the EA model. Similarly, it may be possible to support analysis directly in EA, rather than exporting to other tools and reports. However, for EA to be successful, a disruptive "big-bang" approach must be avoided. Instead, EA must integrate into the enterprise and its processes as they currently exist, and thereby position the organization for a controlled, incremental iterative change process toward a more EA-knowledge-base-centric organization. Similarly, EA should reflect the enterprise as it exists, rather than as it should ideally be. This enables change planning and smooth incremental iterative change execution toward the target state of the enterprise, tracking each step of the transition in the EA knowledgebase to keep it accurate and enable it to support future analysis and change planning.

Decentralizing EA Knowledgebase Use

In the longer term, it is unrealistic to expect a small group of architects to do all import/export of instance information to/from the model of an EA knowledgebase. Rather, a decentralized approach is more practical, where knowledge owners across the enterprise enter and maintain information they "own" in EA. To properly encourage knowledge owners to do so, it is essential that they can generate value early from EA; for example, enabling them to source key deliverables or reports they require from the EA knowledgebase. On the other hand, due to the pervasive impact of the metamodel aspects of the EA knowledgebase, architects should carefully review, approve, and execute updates to the classes, attributes, and relationships that form the foundation of the knowledgebase.

Conclusion

The need to manage the growing complexity in large-scale enterprise architectures is clear. The Protégé ontology editor and knowledgebase framework meet this challenge. This paves the way for organizations to accurately represent and maintain their enterprise architectures in an EA knowledgebase to support business, technology, and organizational needs.

DDJ

1 2 3 4 5 6 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Design