Alternatives for Parallelizing Legacy Serial Code
Applications developed using modern programming languages can become legacy code faster than expected. Multicore microprocessors have been found guilty of transforming outstanding serial code into legacy code in just a few years. However, there are many alternatives to translate multicore power into application performance making small changes to the legacy serial code.Are you coding a new algorithm without parallelism in mind? You're writing legacy code. Are you adding code to events without taking into account a task-based approach? You're writing legacy code. Are you writing code without considering other code running concurrently? You're writing legacy code.
I could go on explaining situations in which the code would become legacy code as soon as it's written.
It's hard to believe. Nowadays, developers are writing more legacy code than ever. They're writing too much code using an old-fashioned serial approach condemned to become legacy code.
Legacy code means problems. Therefore, there is a great question: What can you do with legacy serial code in the parallelism age? The answer is really difficult. There is no just one solution. This time, I'm going to analyze the possibilities offered by batch processes or applications that take a long time to run.
Some batch processes written in legacy serial code are really complex. They weren't designed taking into account the possibility to transform them into a task-based design in a few steps. Rewriting a batch process to take advantage of parallelism is hard work.
In these cases, a task-based approach or a code refactoring process using threads could demand a lot of work. Therefore, instead of working at a thread level, you can work using multiple processes. You can launch as many processes as available logical cores. For example, if the batch process runs on a computer with a quad-core CPU, you will launch 4 times the application with different parameters. This way, you will create four independent processes and each process is going to take advantage of the processing power offered by one of the four available cores.
You can change the code for the process to take into account parameters received through a command-line or other ways of sending information to each instance. This way, you can distribute the work to be done on different processes. It will work on most embarrassingly parallel batch processes.
You still have many problems. You need to perform some coordination activities, like collecting the concurrent outputs and reporting the progress. Another independent application (another process) can do this job.
You can trust a relational database engine to interact with the different independent processes. Most modern relational databases have a lot of experience in concurrency issues. Each process can report the process writing values to a table in a database. The process that coordinates the activities can read from this table.
As you can see, thinking parallel is not just about threads. With a small paradigm shift, you can transform a legacy batch process into a process launcher and you can take advantage of multicore microprocessors. You won't be able to take full advantage. That would require a new task-based approach and new code from scratch. However, sometimes, you don't have the time to rebuild from scratch. In these cases, you can increase the performance offered by legacy serial code making small changes. Most developers are prepared to work with relational databases and with multiple processes accessing the shared data on them. You can take advantage of this situation with some batch processes.
This is an alternative improve the performance offered by legacy code. You can even use this technique using programming languages that aren't capable of creating threads.
If you want to dive deep into the problems created by legacy code, you can read the excellent book "Working Effectively with Legacy Code", written by Michael Feathers.
Besides, if you are interested in researching about the problems created by serial legacy code in Windows running over multicore microprocessors, there is an excellent article where Joe Duffy (Lead developer and architect for Microsft's Parallel Extensions to .Net) talks about this topic: "Windows Legacy Code and Multicore Environments" by Mike Riley.