I've been a big fan of UNIX and, now, of Linux. There is a certain logic to how things go together that made sense to me from the first time I saw it. That's why I'm glad to see more and more embedded systems go to some flavor of Linux.
- Enterprise architects challenged to manage data explosion
- Transforming Traditional Intranets: Three Places to Focus
- SaaS and E-Discovery: Navigating Complex Waters
- SaaS 2011: Adoption Soars, Yet Deployment Concerns Linger
- High Performance Computing in Finance: Best Practices Revealed
- How to Prep and Modernize IT For Cloud Computing
However, I've also noticed a dark side to the wider acceptance of Linux in general. Some of the underlying philosophy has been lost or, if not lost, then at least diluted. Part of this is just due to the nature of GUI systems. Part of it is probably ideas leaking over as people transition from other operating systems.
What kind of philosophy am I talking about? It seems to me that Linux has been moving away from the idea of small modular tools that can be tied together easily. There is also a trend towards more opaque configuration in some newer software. Granted, there are things like dbus that try to fill those gaps, but they don't seem to be as widely used or understood as the classic mechanisms of pipes and text configuration files.
If you think about it, classic UNIX (and systems like Linux) had several key tenets: Everything looks like a file; programs operate on their standard input and outputs; there is a reasonably standard syntax for things like options, globbing, and regular expressions.
Regular expressions, of course, are not specific to UNIX. However, UNIX always embraced them with tools like grep and awk. Because the support is built into the standard library, I've written a lot of code that uses regular expressions that runs under UNIX or Linux. Today, there are plenty of similar libraries for other platforms as well.
If you aren't familiar with regular expressions, they are simple text strings that define patterns that can be matched in other strings. They can range from something simple like:
which would match the string
xz, to something very complex like this date validation expression from RegExpLib.com:
You might be thinking: Regular expressions for an embedded system? Why not? One system I was especially fond of used regular expressions to parse through input data from an external device sent via a USB serial port. Instead of hard coding the particular types of input records, a regular expression matched the record and extracted the data from the fields. If the input formats changed (and they did), it was a simple matter to edit the file that contained the regular expressions and alter the system behavior without so much as a recompile.
This leads to another problem with wider Linux adoption, though. If you've been doing UNIX for the last 25 years, you are probably a regular expression wizard. If you are just using Linux for the last week trying to get a Raspberry Pi or a Beagle Board to do something, you might not be ready to tackle some of the very hairy regular expressions you might need to create (like the date validation expression above).
One thing I've learned is that sometimes the most effective tools are ones you personally wouldn't use. A friend of mine called me last week wanting help crafting a regular expression and I realized that while I'm used to the terse nature of regular expressions, that it was probably the main obstacle to getting them right for people who haven't used them much. Yet those same people can handle a much more complicated programming language. So why can't regular expressions be more like a programming language?
I fired up emacs and started writing some pretty straightforward code (you can download it here). The idea is very similar to my universal cross assembler. The tool provides some simple functions that can be used to build regular expressions using a more verbose programming language-like syntax.
For example, this input:
start + space + zero_or_more + any_of("ABC") + literal(":") + group(digit + one_or_more)
Results in this regular expression:
Like the universal cross assembler, all the real work is being done by the C compiler (well, in this case the C++ compiler, g++). The input gets inserted into a C++ program, compiled, and the output of the program is the regular expression. You can then take the expression and use it anywhere you need it.
The whole compile and run process is handled by a shell script (recompile), which is actually more complicated than the C++ program. The preprocessor (which everyone seems to hate these days) allows you to write things like
start and have it turn into a function call.
You can skim through recomp.h to see all the syntax available. Keep in mind that there are a few flavors of regular expressions, so your mileage might vary on some items. You might also need to change the
classescape functions to suit your target environment. Long term, I probably should add command-line options to handle the different regular expression target environments, but for what I needed, this did the job.
Although this tool might be useful in or out of embedded systems development, I think it really highlights the vastly underused technique of using tools like the C or C++ compiler to do work for you. This is another part of the UNIX philosophy of stringing together tools to get a desired result. Hopefully, as Linux continues to grow in embedded systems, we won't lose sight of that original ideology.