Beware of Those That Claim Linear Performance Increases
Allow me to begin this blog post with the summary: It really depends on the application.CoreMark aims at measuring core efficiency, and nothing else. Multiple copies of CoreMark would (assuming decent scheduling by the underlying scheduler or OS) each operate on a single core. There is no interaction between multiple copies of CoreMark, and synchronization only occurs at the end of the run. Thus we would expect execution of multiple copies to simply result in linear speedup -- and indeed the posted scores show linear speedup.
When comparing CoreMark scores, make sure to check the compiler flags and Recently, EEMBC (Embedded Microprocessor Benchmark Consortium) released a 'free' embedded processor benchmark that we call CoreMark. For the sake of completeness (and curiosity) we included capabilities to allow for execution of multiple copies on multiple cores (and in fact it contains 3 common implementations: PThreads, Fork(with shared memory) and Sockets). Someone recently posted E5405 scores that showed a completely linear speedup with 1,2 and 4 cores. How is this possible?
For true analysis of multi-core devices, system factors such as cache coherency and bus arbitration mechanisms also need to be considered, as does the efficiency of synchronization primitives and the underlying scheduler (or OS). While there are other benchmarks out there to challenge multicore platforms (especially in non-embedded systems), the ones from EEMBC are quite useful for testing embedded platforms.
To return to the summary, while it is possible to see linear speedup in real life, most applications and benchmarks that utilize more than one core (and especially more data-intensive apps) will demonstrate a variable decrease in scaling. Of course, if you've been doing multicore system development, you probably have noticed that already.