Insights into Router Design: Implementation of Networking Protocols
Modern data networking consists of a large number of networking protocols, each of which has its own domain of applicability. Some run on end stations (also called hosts), some on enterprise routers, some on provider edge devices and some on provider core systems. This is a very broad classification of networking systems, and there are other ways to classify them. A networking protocol is the specification of interactions between two or more networking elements. These two or more sides of the conversation are often dissimilar. For instance, one may be a host, and the other a router. Most hosts today, such as a computer running Linux, can function as routers if necessary. Whenever a new protocol is being specified or gains currency, there often appear implementations of the protocol for these hosts, especially for those in the open source segment, such as for Linux or FreeBSD. In several cases, the open-source implementation will include the network side of the protocol as well. Often, these implementations are very good, and in general, the networking industry owes a lot to these public-domain implementations.
There are also companies that sell complete networking stacks to other companies. And the points made in this article may apply to this case as well, with the exception that commercial networking stacks may often be designed to have cleaner interfaces and are usually written for use in routers and switches. So, some of the items mentioned below may be less of a problem.
Adaptation of Public-Domain Implementation
And of course, there are the companies that want to implement the protocol for use in the networking systems they manufacture. As the implementation of each protocol can be quite complex and time-consuming, there is a natural tendency to search for any publicly available implementations that can then be customized and used with the appropriate licensing arrangement, for example, the BSD license or the GNU General Public License (GPL) etc. Depending on time-to-market and other financial constraints, a decision may also be made to buy source code from any of the commercial protocol stack vendors.
One may imagine that the job of adapting or porting an open-source host-based protocol implementation to a router may be just a matter of taking the source code and compiling it within the build infrastructure of the router vendor. However, this is rarely the case.
This article looks at the various challenges involved in a typical porting/adaptation effort of this nature, though not all of these may be applicable to each situation.
Look at the License
The first thing that is usually looked at is whether the source code is being distributed under the GPL or any of its variants, or under a BSD-like license. This is important for a commercial vendor trying to make money out of its product. While the vendor will want to use and appropriately credit any publicly available code, it may not want to have to freely distribute its own proprietary code.
Does it build?
These (open-source) implementations are usually written for end stations or for hosts that can also be used as routers. That is because each router vendor has its own proprietary infrastructure and software interfaces that are not publicly available for applications to use. So, there is usually no way for public-domain code to be written to conform to the architecture of a proprietary networking system. However, present-day routers do often use host operating systems such as Linux or BSD variants. That may make it easier to adapt a public-domain stack for one's own purposes. Once it is evident that the licensing arrangement is feasible for the company to use, the appropriate software developer downloads the software and attempts to build it for the target operating system, i.e. the OS on which it is expected to eventually run. If the OS it was written for, and the OS it will run on, are the same or similar, this is often not a big problem. However, problems may arise if the implementation uses certain system-specific header files that may not be available on the target system, in which case appropriate header files may have to be found or the code modified to use other utilities or data structures.
C or C++
It may be that the acquired software is in C, but the target implementation is all C++. It may be prudent in such situations to merely write C++ wrappers that interact with the C code, rather than attempt to convert it all to true C++ classes.
Integrate with the system architecture
Once it is clear that there are no major compiler or linker issues to deal with, one has to figure out how to integrate what is usually a free-running Linux or FreeBSD process with the rest of the router/switch product. These days, it is often the case that router vendors use popular operating systems, but it is not necessary that each protocol will be running as a free running process. For instance, the downloaded implementation may be a process written in C, while the router may contain a multi-threaded process written in C++, running several protocols. Or each protocol may be split into several components, each of which runs on a different processor. Decisions may have to be made on how best to partition the available implementation such that its various parts run on different parts of the possibly distributed router or switch.
How does it talk to the data plane?
Any networking protocol typically is a mechanism for exchanging control information that eventually influences the flow of data traffic through the system. So, the output of the protocol is often some communication to the data plane for programming its hardware or software tables for flow of data traffic. For instance, the output of the routing protocol OSPF is a set of IP routes that make their way to a forwarding table, so the incoming IP packets can be looked up based on their destination IP address and then forwarded to their appropriate destinations. The public implementation may talk to the kernel for communicating to the data plane. However, there is no guarantee (in fact, it is rather unlikely) that the kernel tables will be used for forwarding traffic on the vendor's router. This means that the programming interfaces (APIs) that the protocol implementation has with the kernel will have to be modified to use the APIs of the target router software.
Is it dynamically reconfigurable?
Just as the interfaces from the control plane to the data plane may have to be completely re-worked, the same is often true of how the protocol is configured and managed. Most protocols require large amounts of configuration, including names, addressing, and other information. On a Linux host, for example, these are usually placed in a configuration file or given on the command line to the process as it is started. However, on a commercial router, it is not practical to restart a process or thread when the configuration is changed. This is actually a big difference in concept. Host implementations are rarely dynamically re-configurable. Router implementations have to be so. This implies that pathways have to be created in the code to take user input and modify the operation of the protocols dynamically. This can lead to several issues to do with how memory is managed. For instance, it is possible that the public implementation uses a fair amount of global data structures, assuming that they remain for the lifetime of the process. However, in a router, that assumption is rarely true.
Protocol implementations, once they read their configuration, usually cache it in memory. However, in a modern router, this may not be in the protocol's process or thread at all. The configuration may be managed by another process, potentially on another processor.
How does it do IPC?
The protocol in question may be an enabling protocol, for other protocols. For instance, BFD information is used by OSPF, or PPPoE information is used by PPP. The mechanism for IPC (Inter-Process Communication) may be quite different in the way it operates on a host versus how it is expected to operate on the target router. On the host, the implementation may use sockets or files or kernel tables. But it is likely to be very different in the target implementation. The vendor's IPC mechanism is often proprietary.
Can it call you back?
Interaction with other protocols also means that various callback schemes may have to be implemented afresh to deal with callbacks from the protocol code into other modules in the system. These may have been direct function calls in the case of the downloaded code, or not even implemented. Besides, the infrastructure of the vendor's software may be distributed not only with regard to processes, but also with regard to processors.
States and Events
A protocol state machine can change state only if it is informed of an event. One type of event is the change of state of an interface such as an Ethernet link. Every protocol implementation will have a mechanism to receive such events. However, there is no guarantee that the event distribution mechanism assumed by the available implementation has anything in common with the software architecture of the vendor's system. This is an important issue to address, as is the similar issue that may be found in the case of timer or packet events.
The host-based implementation may need to have only one instance of the protocol running within a process. However, a router may need several, perhaps thousands of instances of the protocol running at the same time. This means the memory model of the implementation may have to be substantially modified to suit the purpose.
Endian-ness and Byte ordering
Great care must be taken to ensure that the protocol implementation is correct in its use of byte-ordering. Many a problem has been created by the typecasting of packets into structures assuming a Big-endian environment. Careful endian conversions may have to be done to ensure correctness. If the downloaded implementation assumes the endian-ness of the processor, then each field in the protocol header must be carefully analyzed to make sure there are no byte-ordering issues when it finally runs on the target platform.
These days, in certain technologies, most of the differences between vendors have come down to how fast they react to failure conditions such as a link going down, or a software process crashing, or a line card or other hardware malfunctioning. The internal recovery procedure, usually called High Availability, often consists of 2 parts: an initial movement or caching of various kinds of data in other processes or processors, and after failure, recreation of the state from the surrounding processors and processes. It is usually unlikely that the free software contains any implementation of High Availability, because any HA implementation will have to be tied to the target platform. Most features these days are not considered complete on high-performance networking devices unless they also contain a mechanism to deal with failure. So, the HA implementation is usually something that has to be completely written from scratch.
As the protocol implementation originated as a standalone process, however, there can be some definite aids to debugging, if the adaptation of the code is planned right. When the various modules are modified to fit the target system, care must be taken to also code test clients that can be hooked up to these software interfaces, so much of the modified code can continue to be tested on a standalone host. This has the advantage of being able to continue to test and debug issues using standalone processes, for such testing is often much faster and easier than testing on a target platform.
Another problem is the maintenance of the core protocol logic. For a public distribution from say, a well-regarded open source project, new versions are periodically released where bugs are fixed. If the protocol code has been altered beyond recognition during the process of integration, it can become difficult to upgrade to those bug fixes. One way to do it would be to always keep the original distribution in the repository, so it can be compared to the latest version. And any differences can be selectively and manually made in the target software.
Think and Decide
Thus, the porting or adaptation of a publicly available protocol implementation to a vendor's router/switch platform is a serious project, and should be attempted with great care. If the free version is a good and reliable distribution, this may often be a better approach than completely writing the protocol implementation from scratch, especially for complex networking protocols. The time that may be saved by avoiding the writing and testing of the actual protocol state machine, can turn out to be quite significant for the overall lifecycle of the product. However, the time required for the above aspects of the work should be carefully considered and scheduled in, for the project and the product to turn out to be a success.
About the author --- Rajesh Kumar Venkateswaran has been developing software in the data networking and telecommunications industry for many years. He has led teams, architected solutions, and created new products in this space.