Although the name might not suggest it, if you develop C or C++ code in Microsoft Visual Studio, Intel Parallel Studio may be just what you're looking for even if you have no intention of parallelizing your code.

If you need any of these features:

- Detailed memory access checking and leak detection
- Fully integrated performance analysis
- Optimizing C/C++ compiler
- Highly optimized functions for audio and video processing, signal processing, or compression

then Parallel Studio is for you.

Let's take those one by one, using a simple code as an example. The code is trying to solve the logic problem which is described like this:

*Find a number consisting of 9 digits in which each of the digits from 1 to 9 appears only once. This number must also satisfy these divisibility requirements:
*

*
*

- The number should be divisible by 9.
- If the rightmost digit is removed, the remaining number should be divisible by 8.
- If the rightmost digit of the new number is removed, the remaining number should be divisible by 7.
- And so on, until there's only one digit (which will necessarily be divisible by 1).

I actually solve the more general problem in which I have an N digit number in base N+1. Here is divisible.cpp, the sample code:

// Written by Jim Cownie (Intel), placed in the public domain. // // Beware, this code has bugs (both deliberate, and, probably, non-deliberate!), // though it does produce the right answer... // #include "stdafx.h" // Solve the logic problem which is to find a number consisting of 9 digits // in which each of the digits from 1 to 9 appears only once. This number // should satisfy the following requirements: // // a. The number should be divisible by 9. // b. If the rightmost digit is removed, the remaining number should be divisible by 8. // c. If then again the rightmost digit is removed, the remaining number should be divisible by 7. // d. etc. until the last remaining number of one digit which should be divisible by 1. // // We actually solve the more general problem in which we have an N digit number in base N+1, // since generality is good. #include <iostream> #include <string> using namespace std; typedef unsigned long long uint64; class number { int nDigits; // number of digits in the number int * digits; // space for the digits of the number public: // Null constructor number(int n) { nDigits = n; digits = new int [nDigits]; for (int i=0; i<nDigits; i++) digits[i] = 0; } // Copy constructor number (const number & other) { nDigits = other.nDigits; digits = new int [nDigits]; for (int i=0; i<=nDigits; i++) digits[i] = other.digits[i]; } // Set the value of a specific digit of the number void setDigit(int digit, int value) { digits[digit] = value; } // Obtain the value of the nd leftmost digits of the number, // or the whole number by default. uint64 value (int nd=0) const { if (nd == 0) nd = nDigits; uint64 v = 0; for (int i=0; i<nd; i++) v = v*(nDigits+1) + digits[i]; return v; } // Format the leftmost nd digits of the number, or the whole number by default. string format(int nd=0) const { if (nd == 0) nd = nDigits; string s; for (int i=0; i<nd; i++) { int d = digits[i]; char c = d < 10 ? '0'+d : 'A' + (d-10); s += c; } return s; } // Test whether the number meets the divisibilty criteria. bool validate() const { for (int i=nDigits; i>=1; i--) { if ((value(i) % i) != 0) return false; } return true; } }; static bool * usedDigits; static int numDigits = 9; // Build all possible numDigits digit numbers which have unique digits // and test whether they meet the required criteria. // We recursively walk the tree of possible numbers, maintaining the // information about what digits have already been used in "usedDigits". static void generateAndTest(number n, int depth) { if (depth == numDigits) { if (n.validate()) { // We have a solution. Print it, and demonstrate that // it meets the divisibility criteria. cout << "Solution is " << n.format() << endl; for (int i=numDigits; i>=1; i--) { cout << n.format (i) << " % " << i << " = " << n.value(i)%i << endl; } } return; } // Try all the available digits for this position in the number. for (int i=0; i<numDigits; i++) { if (usedDigits[i]) continue; usedDigits[i] = true; n.setDigit(depth, i+1); generateAndTest(n, depth+1); usedDigits[i] = false; } } int _tmain(int argc, _TCHAR* argv[]) { if (argc > 1) numDigits = _wtoi(argv[1]); usedDigits = new bool [numDigits]; for (int i=0; i<numDigits; i++) usedDigits[i] = false; number n(numDigits); generateAndTest(n, 0); return 0; }

As written, the code seems to work; after all, it produces the right answer. However, maybe there are still problems. I'll use Parallel Inspector to see.

### Memory Debugging

Once you have installed Parallel Studio, you will find additional toolbars in your Visual Studio:

For correctness issues like memory debugging, use the Parallel Inspector tool whose toolbars are tagged with the symbol . ("In" here stands for "Inspector", not the element Indium). When I click on the "Inspect" button, a dialog window opens to ask how strict I want to be. Since I'm trying to find all the problems I can, I turn the analysis up to be as intensive as it can be, by moving the slider to the bottom. This is the most time-consuming test mode, but since the code is relatively small that's fine.

Hitting the Run Analysis button executes the code and produces the results.

Ouch! There are a lot of problems, despite the code seeming to be correct.

A quick look here shows that we have two invalid memory accesses at line 49 in divisible.cpp, and many memory leaks, mostly at line 46.

Obviously I need some more help, so I hit the Interpret Results button.

Let's fix the memory access problems first, since those seem more dangerous than the leaks. Double clicking on the highlighted problem line shows the source code, an, below that, more detail about the problem.

Applying my brain, I can see that the loop upper bound test here is wrong, I have **i<=nDigits** but looking at the constructor, I can see that I allocated the digits array with only **nDigits** elements, so I'm accessing beyond the end of the array. The loop condition should be **i<nDigits**. So, the copy constructor should look like this (with the change from <= to <** highlighted**). You can get straight to a source editing window by double-clicking any line in the Source View.

// Copy constructor number (const number & other) { nDigits = other.nDigits; digits = new int [nDigits]; for (int i=0; i<nDigits; i++) digits[i] = other.digits[i]; }

That bug nicely explains both of the invalid memory accesses, since the same line is both reading and writing the digits array.

Now that I've fixed the invalid accesses, what about the memory leaks?

I investigate them by switching back to the Inspector view by clicking on the tab for the experiment (in this case **r005mi4**). Now, I can select one, and then switch from the Overview to the Sources view by selecting the Sources box. That shows me a display like this:

I can now see that the source of the leak is the point in the constructor for number where I allocate space for the digits. If I look at the other leaks, (which are shown as separate errors because they occur with different call stacks), I see that all except one are caused by the same allocation, and the one which is different is in the other constructor, where it is making the same allocation. Hmm, time to read the code again…

Aha, I have committed a cardinal sin. I have a constructor which allocates space, but no destructor for the object! That's easy enough to fix, I just have to add the obvious destructor:

// Destructor ~number() { delete [] digits; }

Once I've done that I can re-run the memory inspector to confirm that I have fixed all the problems which it can find.

This = is getting quite long, so I'll defer looking at performance until next week, when I'll also think a bit more about the complexity of the problem, and whether I really need all the code I have.

If you're feeling keen, you can play with Parallel Amplifier by then, and see if you can find the same performance problems with my code that I do (unfortunately there are no prizes!)