How Many Threads Are Enough?
There is a point where the overhead in managing multiple processors outweighs the speed up and other advantages gained from parallelization. The adage you can never have enough processors is simply not true.
Communication between threads or synchronization between processors comes at a cost. The complexity of the synchronization or the amount of communication between processors can require so much computation that the performance of the tasks that are doing the work can be negatively impacted. In this cases its more effective to write a program based on a sequential model.
For example, if we wish to sort a list of 100 numbers, we could attempt to divide up the list of 100 into groups of 10 and sort each group in parallel and then merge the groups of 10 into one sorted list. But the time it would take to divide the list into groups of 10 and then communicate that lists to each group, and then merge the results into the a single group all while trying to avoid data race requires more effort time than it would take to simple sort the numbers using a sequential method. On the other hand if we had a few terabytes of numbers the parallel approach could be more seductive.
The question is how many processes, tasks, or threads should a program be divided into. Is there an optimal number of processors for any given parallel program? At what point does adding more processors or computers to the computation pool slow things down instead of speeding them up? It turns out that the numbers change depending on the program. Some scientific simulations may max out at several thousand processors, while for some business applications several hundred might be sufficient. For some client-server configurations, eight processors are optimal and nine processors would cause the server to perform poorly.
The limit of software processes might be reached before we've reached the optimum number of processors or computers. Likewise, we might see diminishing returns in the hardware before we've reached the optimum number of concurrently executing tasks.
Ideally something in the model decomposition of the problem or the model decomposition of the solution determines how many threads or processes will be necessary. However, the actual implementation of the model can introduce so much complexity and overhead that a new model may have to be selected.
While we are strong advocates for the notion that the techniques, tools, languages, software libraries should follow the decomposition model and not the other way around, the complexity of implementing the solution decomposition model has to be considered. There are also many variations on halting problems in applications that include multithreading and multiprocessing. As the number of cooperating tasks for an application increases, the complexity of the interdependencies increase. This can lead to very fickle and brittle software implementations. When multiple tasks are cooperating to provide the solution to some problem, what happens if one or more of the tasks fail. Should the program halt or should the work be redistributed somehow? This is a problem if we have only two concurrently executing tasks. The difficulty in resolving possible task failures rise exponentially as the number and interdependency of threads or processes in a single application increases.
This is an excerpt from our book, Professional Multicore Programming: Design and Implementation for C++ Developers, Chapter 3: The Challenges of Multicore Programming.