Visualizing Context Switches that Cross Cores with Visual Studio 2010
Context switches that cross from one logical core to another can reduce the performance of your application. Visual Studio 2010 Premium or Ultimate versions allow you to visualize this problem on Windows Vista, Windows 7, Windows Server 2008 or Windows Server 2008 R2.
When a context switch crosses from one logical core to another, a thread that was being executed on a physical core might shift to a completely different physical core. Cross-core context switches have an impact on overall throughput in multi-threaded applications. However, a single-threaded application running on a multicore CPU also suffers from cross-core context switches. A simple C# example will allow you to understand and visualize cross-core context switches thanks to Visual Studio 2010 new concurrency profiling features.
The following C# code shows a simple and single-threaded Windows console application that consumes CPU cycles by appending millions of strings to a StringBuilder instance.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ContextSwitch
{
class Program
{
private const int MAX_NUMBER = 7000000;
private static void Main(string[] args)
{
var sb = new StringBuilder(MAX_NUMBER * 7);
for (int i = 0; i < MAX_NUMBER; i++)
{
// Append 7 strings
sb.Append("START");
sb.Append(i.ToString());
sb.Append((i / 2).ToString());
sb.Append((i / 3).ToString());
sb.Append((i / 4).ToString());
sb.Append((i / 5).ToString());
sb.Append("STOP");
}
// Just write the first letter of the resulting string
Console.WriteLine(sb.ToString()[0]);
}
}
}
If you use Visual Studio 2010 Premium or Ultimate concurrency profiling features to visualize the behavior of this single-threaded application on a multicore CPU, you will be able to analyze context switches in the Cores view. I use a single-threaded application as an example because it makes it easy to understand what happened under the hoods for just the main thread.
The following snapshot shows the Cores view with the results of profiling this application in a computer with a quad-core CPU with Hyper-Threading. The CPU has eight logical cores, also known as hardware threads.
The profiler displays a graph with visual timelines that show how each thread was mapped to the available logical processor cores. This managed application doesn't control have code that controls thread affinity. Remember that thread affinity suggests the operating system scheduler to assign a thread to a specific logical core. However, it isn't convenient to use thread affinity with managed code.
This graph with visual timelines shows how the managed thread scheduler and the operating system distribute the diverse threads during their lifetime. When threads move from one core to the other, a cross-core context switch occurs, and the legends for the graph provide summary information about the total number of these cross-core context switches. The Cores view shows how the different threads created by the application run on the eight logical cores. In this case, a single thread suffers from many cross-core context switches. The green bars that represent the main thread appear on many different logical cores.
While the application was being profiled, the main thread had 417 cross-core context switches. Obviously, this number has an impact on overall performance.
You can use the zoom slider to visualize the cross-core context switches with more detail at specific times. The following snapshot shows 19 cross-core context switches in less than 400 milliseconds.
The next snapshot shows another 3 cross-core context that make the main thread move from logical core 0 to logical core 2, from logical core 2 to logical core 4, and then from logical core 4 to logical core 6.
Most modern CPU micro-architectures include at least one of the cache memories shared between all the physical cores. One of the reasons for the existence of this last level shared cache memory is because CPU manufacturers want to reduce the impact of cross-core context switches in the cache. A last level shared cache usually improves overall throughput.
When you write code that creates tasks with C# or Visual Basic and .NET Framework 4, the underlying threads suffer cross-core context switches. There is a lot of work being done to reduce undesired cross-core context switches in most modern run-times and operating systems.

