Generally, functional problems can be easily reproduced by hooking up a traffic generator to the system in the lab and sending just a few packets, ideally just a single packet. This is why people call them "deterministic problems". Every time you run your simple experiment the system behaves the same way, as opposed to the stability problems which often need a variable sequence of operations and input packets during a significant amount of time to be triggered.
As the problem can be reproduced with a few or even a single packet, you are able to take your time to analyze the state of the system in detail after each packet. You can even afford the luxury to use a step-by-step traditional debugger and follow your packet step-by-step through the processing pipeline as your breakpoints are getting hit.
Some good examples of such problems include receiving the output packets malformed (some fields are different than expected), or on the wrong output interface (routing problem), or not coming out at all (they are getting dropped somewhere along the processing pipeline).
Implement statistics counters for each possible exit out of the processing pipeline. An even better solution would be to implement them for each block of the pipeline. The statistics counters should be designed so that no packets will ever get "lost", at the pipeline level or at each block level: the number of input packets should always equal the number of output packets plus the number of discarded packets. The statistics counters seem to be the Swiss army knife of debugging the packet-processing systems.
For example, assume that pipeline block A can send the input packet out to block B or block C after processing, or discard the packet as result of any of the conditions D, E, or F being met. If each of these paths would have a counter associated with it that gets incremented every time a packet takes that path, then the packets can be traced down through the pipeline regardless of whether block A is working correctly or not. Remember, you can send just one packet at a time, so by looking at these counters you can easily find out what happens with each packet that is injected into the system by the traffic generator.
It is inefficient to find out while debugging that you are unable to trace all the packets through the system because not all of your code paths have been consistently instrumented. Basically, some packets got lost somewhere within the system and you have no way of knowing where exactly this took place and why. One common mistake people make is to instrument just the main path and forget about the secondary paths through the pipeline. Usually the problems occur exactly on these paths which get frequently overlooked, as people focus on the functionality and performance of the main path.
The drawback of this approach is that too many statistics counters can affect the performance of the system due to memory accesses that are involved. Some people would say that this yet another manifestation of the famous Heidegger incertitude principle, according to which the state of the system cannot be observed without modifying the state of that system. The simple workaround is to conditionally compile these counters into the code, so that they be enabled during the functional debugging and disabled for performance reasons after the functional testing has been completed.
Do not start optimizing for performance before all the functional problems have been solved.
Make sure that the system has passed all the functional testing before focusing on performance optimizations, otherwise all the optimization work is useless. The optimization would have to be completed all over again after the functional problems have been discovered, investigated and fixed. Remember, no networking box will ever be accepted for deployment if functionally flawed, no matter how amazing the performance numbers for some of the paths may be.
Building a command-line interface (CLI) into your application lets users interact with the system at runtime without having to stop the system under debugger control to inspect its state. Most of the modern networking equipments provide this feature for system configuration (e.g., for listing, adding or deleting entries from application tables such as the routing table or the ARP table), so your application should provide it as well.
As long as the CLI is in place for for system configuration, why not add support for system debugging into the CLI as well?
When building the CLI, the commands should allow:
- Inspection of the statistics counters built into the application
- Inspection and modification of memory addresses and device registers, allowing access to the tables and data structures maintained by the application, packet descriptors and packet buffers and to any memory address in general
The benefits of the CLI are that it allows inspecting the system state quickly and it provides a single entry point for accessing all the debugging tools at hand. Ideally, you send a packet and then just hit the keyboard once, and the full state of the system is conveniently displayed for you.
There is probably no need to stress how useful for debugging the printf messages are, especially in the early stages of development. They do slow down the system and they will never make it to the release code, but general consensus among engineers is that they are very valuable for debugging.
Sometimes the printf mechanism is not readily available, usually on some special-purpose processor cores that are part of the programmable accelerators, which do not have access to any output port to send the messages to. The software running on these cores, usually firmware, is very difficult to debug when lacking any mechanism that allows getting messages out of these cores.
The workaround is to emulate printf by assigning a memory buffer to these cores that they can use to log data in a predefined format, either statistics counters or strings Another general purpose core can periodically poll this buffer, read the data, decode it and print it on the screen, or the user can trigger a new read of the log buffer through the CLI.
Generally, the blocks of the packet-processing pipeline are interconnected through queues of messages, where a message can be a packet descriptor, a request message or a response message.
It would be very useful during debugging to inject one packet at a time and trace the path of that packet through the various blocks of the pipeline by determining which queues have been transited by the packet in question. The possible cases for the state of each queue after the packet transition through the system are:
- The write pointer has not been modified: the packet was not written to this queue
- The write pointer was incremented, but the read pointer is unchanged: the packet (or a message associated with it) was written to the queue, but not read from the queue, so there is a problem with the consumer of the queue, as it does not read the messages from its input queue
- Both the write and the read pointers have been incremented: the packet did transit through this queue, as it was written to the queue and also read from the queue later on
The steps to build the queue monitor are:
- Initialize the monitor: read the initial values for the write and read pointers for all the queues
- Send one packet and call the monitor: read the current values for the write and read pointers for all the queues and diff them against their previous values to identify which queues have been transited by the packet and which not
The monitor initialization and call are usually implemented as CLI commands. The monitor call command prints out the IDs of the queues that have been transited by the packet. Once the monitor has been initialized once, there is no need to initialize it again for every new packet that is injected, as the current values of the queue pointers are already known as result of reading them for the previous packet and thus these values become their initial values for the new iteration. The monitor needs to be reinitialized only after one or more packets have been injected into the system without calling the monitor, as the previously read values for the queue pointers are now obsolete.
For some scenarios, several messages are created written to the same output queue for the same packet. For example, the IP fragmentation block might produce more than one output packet for the same input packet. If not all the messages are read from its output queue by the next consumer block in-line, then there might be a problem with the consumer block.
For some other scenarios, it is meaningful to inject more than one packet between two consecutive monitor calls. One example is the IP reassembly scenario, where several IP fragments have to be received before the IP reassembly block produces one output packet to its output queue.
In Part 2 of this article, I examine issues related to stability and performance problems.