Brian is director of the Electronic Commerce Group at Hello Direct. He can be contacted at [email protected]
Software developers creating applications for use on public and private networks confront a number of performance issues which, to date, have restricted the utility of network-delivered software. While processing power has increased several orders of magnitude in the past 20 years, bandwidth remains a bottleneck and, for many users, will continue to be an issue for years to come.
The concept-registry system (as well as concept-oriented programming, a technique that exploits this system) described here, makes it possible to write software that requires far less bandwidth to deliver, and thereby to increase apparent delivery speeds significantly (an order of magnitude improvement). It also creates a mechanism for disseminating reusable code throughout the Internet, effectively turning the Net into a repository of reusable code that many developers can utilize. This system does not introduce any fundamentally new ideas; instead it employs a combination of existing concepts and methods, including:
- Numeric codes to represent symbols (characters, machine instructions, and the like).
- A distributed database that maps to/from a numeric domain to other domains (DNS, for example).
- Semantic networks.
The concept-registry implementation I present here is an accidental outgrowth of another project. Concept registry was originally intended for use in a multilingual communication system called "Picto" (short for "Pictograph"), a markup language that lets users publish simple messages that can be rendered into multiple languages. The original idea behind Picto was to create a chat tool best described as "emoticons on steroids," where each symbol has a distinct meaning.
Example 1 is a simple Picto message that translates to "Hello World" (where concept #1 is "Hello" and concept #2 is "World"). While not adequate for complex messages, this markup language can be used to convey simple messages that can then be rendered in multiple languages. People won't be using it to quote Shakespeare, but it works fine for exchanging simple messages with predictable grammar (multilingual chat is one candidate application). Real-time applications are especially interesting candidates because users can adapt to the idiosyncrasies of the translation tool (for example, being forced to clarify meaning when a word has many possible meanings or uses).
The concept-registry system, when used in conjunction with this markup language, maps numeric expressions into target languages. Numeric concepts are tagged with usage parameters that describe how they are used in an expression, and how they are linked to other concepts in an expression.
To find out more about concept registry and to contribute to this open-source project, visit http://www.picto.org/. There you will find open-source utilities and information for use in building back-end and client-side implementations of this technique. These utilities and the source code for a concept-registry server are also available electronically from DDJ (see "Resource Center," page 5).
Concept-oriented programming is a straightforward extension to object-oriented programming. Its primary contribution is to turn wide area networks (WANs) into a facility for software development and distribution.
The most important additions are the creation of a global address space that uniquely identifies reusable machine instructions, and a global network of registry servers that cache these concepts for rapid retrieval at run time. There are numerous applications for the technique. The technique can be used with any programming language or operating system. The system described in this article lets concepts be defined in many machine-language and natural-language domains simultaneously (a capability that also allows it to be used as a global help file).
Suppose you write a sorting algorithm that you want to share with other programmers. You compile this into a DLL or some other executable form, so that it can be easily referenced by other programs at run time. You would register the procedure, and would be assigned a unique numeric ID for your sorting algorithm. Say, for the sake of example, that your algorithm becomes concept "#51221." No other algorithm will be assigned this number. Other programmers could then reference this procedure in their programs with a statement such as this hypothetical example:
SortedScores = SortClass.SortScores(Score)
What's a Concept?
The concept-registry system creates a numeric address space for concepts. In this system, concepts are simply numeric placeholders for ideas. A concept could refer to a reusable machine instruction, VRML object, abstract idea, or natural-language expression. The concept-registry system creates a numeric address space for ideas. Each concept is given a unique numeric address so that it will not be confused with other concepts. Just as you request an IP address for a new workstation, you would request a concept-registry system address for a new idea (whether that idea is a machine instruction or natural-language expression). Table 1 lists some hypothetical concept-registry entries.
The concept-registry system consists of two important components:
- A numeric address space, which uniquely identifies all globally registered concepts.
- Concept-registry servers, which are distributed throughout public and private networks that process concept resolution requests and disseminate translation tables throughout the network.
The concept-registry system is, in a sense, like the domain name system (DNS), except that it maps numerically identified concepts into many language domains. What is especially interesting is the concept-registry system typically indexes a concept in multiple languages. In Example 2, for instance, the system:
Translates concept #51221 into Java bytecode.
Concept-registry server replies with Java bytecode for this instruction.
Translates concept #51221 into English.
Concept-registry server replies with English description of what the procedure does.
Translates concept #51221 into Spanish.
Concept-registry server replies with Spanish description of what the procedure does.
In this example, the concept-registry server is processing requests to translate numerically identified concepts into either machine instructions or natural-language expressions (that is, to provide explanation or documentation for the concept). The concept-registry server is not required to understand the information it is providing. Like other directory servers, it merely maps information from one domain into another.
Concept-Registry System Services
The concept-registry system provides the following basic services:
- Concept resolution/translation. Local concept-registry servers process requests to translate a numeric concept into a target language. Concepts can be translated into machine languages, display languages, or natural-language expressions.
- Concept distribution/replication. Just as the DNS distributes update host tables, the concept-registry system will update concept registries on a daily basis.
- Conflict resolution. Master concept registries ensure that duplicate ID numbers are not assigned to concepts, thus ensuring that each concept has its own unique address.
- Reverse lookups. Concept-registry servers can search for a pattern in their table of registered concepts. This is used in multilingual applications, specifically to create lexicon services and translation aids.
Creating High-Performance Network Software
One of the greatest practical benefits of this system is the ability to reduce the size of network-delivered software, therefore increasing apparent transmission speeds. The system creates, in effect, a smart caching system that eliminates the redundant transmission of instructions, and lets users cache large libraries of reusable machine instructions in close proximity to end users.
Instead of transmitting the entire program to users, you can send only the upper layers of the program, which, in turn, reference numerically identified instruction sets that may or may not be cached on the end user's computer. If the end user's computer has encountered these concepts before, it will fetch the underlying instructions from a local cache. If not, it will contact a nearby concept-registry server to request the underlying instructions. While this introduces obvious security issues (see http://www.picto.org/), the technique lets you realize order-of-magnitude improvements in apparent delivery speeds.
I call these programs "origami executables" because they are comparatively tiny programs consisting of numeric pointers to underlying instruction sets (which may themselves contain references to other concepts). These programs expand into a complete set of instructions at run time, thus increasing apparent transmission speed to users. (While this technique will substantially improve delivery times, it will not improve execution speed.)
Example Scenario #1
Consider, for example, a scenario in which a corporate workgroup is running applications over a WAN (Figure 1). A corporation installs concept-registry servers throughout its WAN. The concept-registry servers have a 10/100 Mbits/sec path to end users, and are constantly updated with the latest concepts (much as DNS automatically distributes updates to DNS servers daily, so too will the concept-registry system).
Users on these networks will receive most of the instructions from local concept-registry servers that have a 10/100 Mbits/sec path to users. Since these concept-registry servers cache instructions used by the entire workgroup, the performance improvements are impressive. Instead of loading applets from a central point through a congested WAN link, users have an apparent 10/100 Mbits/sec connection to the server.
To calculate the performance improvement, use the formula:
ACR = TB/TC
where TC is the time to deliver code using concept-registry technique and TB is the time to deliver code using conventional technique. Then calculate:
TB = (UC + PL + CL)/IBW
TC = (UC/IBW) + (PL/CBW) + (CL/DBW)
where ACR is the apparent compression ratio, IBW is the Internet bandwidth (effective throughput from end user to distant server), CBW is the bandwidth from end user to nearby concept-registry server, DBW is the bandwidth to local disk drive or LAN-based registry, UC is the unique code size in KB (your program and its unique libraries), PL is the size of publicly registered concepts in KB, and CL is the size of locally cached concepts in KB.
The key metric -- apparent compression ratio -- is the perceived bandwidth available to load the program. The technique easily increases apparent throughput several times, and when fully exploited can deliver order of magnitude improvements.
Example Scenario #2
In this scenario, assume a 500-KB program contains code in which 475 KB of code is stored in the concept-registry system and 25 KB is unique to the application. Users are on a small LAN with a 128-KB connection to the Internet, and an in-house concept-registry server that has a 10 Mbits/sec path to users. The user's disk drive has 100 Mbits/sec of bandwidth. The program contains several widely used concepts, some of which (say 25 percent) the user has encountered before. According to our formula, the apparent compression ratio will be 16.86:1, making the user's 128-KB connection look like a 2.15 Mbits/sec connection.
To support capability such as this, programming languages need to be extended to support concept notation, and to create executable code that can be distributed independent of the entire program (a mini DLL, in other words). Adding support for concept notation to a program is not that difficult. When concept notation is incorporated into languages, the compiler merely needs to be able to talk to a concept-registry server to obtain the machine language "translation" for a given concept and merge this code into a program, either at compile time or run time. The details of how concept notation is expressed in each language vary. Examples of how this might appear include:
SortedScores = SortScores(ClassScores As Array) UseConcept(51221)
Bind SortClass Using 51221;
Bind HistogramClass Using 78910;
Concept-Oriented Operating Systems
Concept-oriented techniques can also be used to build operating systems. A concept-oriented operating system would have some attractive features compared to current systems, including:
- Compact design. The OS could be distributed as a very small package that would then obtain additional OS components from the concept-registry system.
- Continual evolution. The OS would evolve automatically as new components are registered. This does away with the notion of upgrading an OS.
- Network appliances. Such an OS would be highly useful for inexpensive network appliances. The appliance would contain only the code needed to boot itself, and would obtain higher level components from nearby concept-registry servers, thus reducing the cost of maintaining these devices.
- Rapid innovation. Automatic dissemination of updates to users increases the rate at which the OS evolves. An open-source OS based on this technique would benefit from contributions from many sources.
- Automated replication. Every machine running a concept-oriented OS could, in turn, become a concept-registry server, providing nearby machines with one or many high-speed servers to talk to. Each new machine increases the overall processing capacity of the concept-registry system as a whole.
A concept-oriented OS would not be fundamentally different from a conventional OS, except that all of its components would be registered in the concept-registry system. The only new feature is the use of the global addressing scheme provided by concept-registry systems to track and retrieve components from the network.
A hypothetical OS could be delivered as a small package that would provide basic I/O, logic, and not much else -- just enough to boot the machine in VGA mode and start talking to the network. Once launched, the OS would automatically obtain additional concepts required by the OS. This could be done on an as-need basis (don't download the floating-point math class library until it is needed), or on a preemptive basis (download concepts in order from most used to least used). Through sleight of hand, you could build an OS that appears to be small enough to fit on a floppy disk, yet is infinitely extensible.
Since the concept-registry system can translate numeric concepts into multiple machine- and human-language domains, the system can be used to store documentation for machine instructions. Because the system is open, developers in many countries could contribute comments and documentation for publicly registered components. Therefore, concept registry can be used as a globally distributed help file for the components registered, with developers in many countries contributing to the knowledge base.
Again, concept registry was originally developed to support language translation aids, such as tools to translate foreign words and phrases. One such application is a web browser plug-in that uses concept registry to look up translations for highlighted words and phrases. Since concept registry can index concepts in any number of languages, this plug-in can serve as a universal translation dictionary.
The concept-registry system, and programming techniques that leverage it, are still in embryonic stages of development. Consequently, your criticism and code are welcome.
Copyright © 1999, Dr. Dobb's Journal