Profiling an application is necessary when you want to learn more about how it runs and where its bottlenecks are.
June 14, 2006
URL:http://www.drdobbs.com/architecture-and-design/profiling-windows-c-applications-with-mi/189401136
Oguz Kupusoglu is a software engineer. He can be contacted at [email protected].
Profiling an application is necessary when you want to learn more about how it runs and where its bottlenecks are. A profiler is an analysis tool, which tracks the application while it is being executed. Although the capabilities of the profilers vary, graphically displaying the call cost and hit count--the average time elapsed in milliseconds for a function call, and the number of times a function is called, respectively-- are the most essential features profiling tools should have.
I use Microsoft Visual C++ 6.0 Professional Edition on Windows 2000 and Windows XP. Although this compiler has a profiler, it is a bit cumbersome. Moreover, analyzing the profiler data is not easy. Microsoft has provided the PROFILER.XLM macro to analyze the profiler data. However, I think it is also difficult to use. Consequently, I decided to develop my own simple profiler. The complete source code (and related files) for the profiler is available online.
My primary design goals are:
I decided to develop a class to collect the profiler data and a tool to analyze them graphically with Microsoft Excel (I tested the tool with Microsoft Excel 2000 and Microsoft Excel 2003.) Obviously, I am not after precise measurements; I simply want to compare some numeric data. The CProfiler class is aimed at function level profiling; it doesn't provide any data on line level.
The tool is tightly coupled to Microsoft Excel. However, I developed the CProfiler class with portably in mind; the SData struct is a wrapper for any specific implementation.
Listing One shows this class. Note the static functions and data members.
struct SData; class CProfiler { public: CProfiler(wchar_t* pId); CProfiler(void* pAddress, wchar_t* pType, wchar_t* pOp, wchar_t* pId); ~CProfiler(); bool InitInstance(); void StopInstance(); static void InitClass(wchar_t* pDir, bool profiling); static void StopClass(); static void DeleteTemps(); static void ProcessData(); private: static bool IsDirOK(wchar_t* pDir); SData* m_start; FILE* m_file; bool m_failed; wchar_t* m_pId; static SData* s_frequency; static wchar_t s_dir[LEN_BUFFER]; static bool s_profiling; static bool s_failed; static bool s_stopped; static std::maps_files; static void* s_mutex; };
To simplify the usage, I developed the macros in Listing Two.
#define PROFILER_INITCLASS(dir) CProfiler::InitClass(L##dir, true) #define PROFILER_INITCLASS_CURRDIR CProfiler::InitClass(0, true) #define PROFILER_PROCESSDATA CProfiler::ProcessData() #define PROFILER_START(id) CProfiler profiler(L##id) #define PROFILER_STOP profiler.StopInstance()
The benefit of the macros is turning on/off profiling easily. If PROFILER_PROFILING is defined, the application will be profiled; if not, the macros simply go away.
Starting with Windows NT, Microsoft operating systems use UNICODE strings internally. If the ANSI strings are used, they are converted to UNICODE first. If you use the UNICODE strings, no conversion is necessary and thus the performance improves. Consequently, I decided to use the UNICODE strings in the CProfiler class at the expense of not supporting the ANSI-only early Windows versions. However, I think this is not a big loss. Note that the macro parameters are prefixed "L" with the "##" token-pasting preprocessor operator to specify that they are UNICODE strings.
The basic idea is to create temporary text files per thread and dump the profiler data to them in the comma separated variable (CSV) format. When a CProfiler object is created, it gets its thread id via the Windows GetCurrentThreadId() API call. Then, the object searches the static map of thread id's and temporary files. If a file is opened already for the thread, the object uses it. Else, a file is opened named after the "threadId.profiler" pattern and the object uses it. I selected the "profiler" suffix instead of the "csv" deliberately; using the obscure "profiler" suffix decreases the possibility of accessing or deleting wrong files. Note that a mutex controls the access to the map.
The class should be initiated with a directory where the temporary files and the final file will be created when the application is launched. Before the application is terminated, the class should process all the temporary files to create the final one; see Listing Three.
Void main() { PROFILER_INITCLASS("C:/Profile_ProjectName"); ... PROFILER_PROCESSDATA; }
If no directory is specified, the class uses the current directory for the profiled application. In this case, as the "current" directory may be set by some other means, one can end up using an unexpected directory. Then, the developer should use the PROFILER_START and PROFILER_STOP macros within the functions to be profiled; see Listing Four.
void CFoo:Foo() { PROFILER_START("CFoo:Foo"); ... PROFILER_STOP; }
The CProfiler constructor takes a string id and starts profiling. Initially, I thought that the CProfiler destructor should stop profiling. However, as destructors are not usually called deliberately, this would leave stopping the profiler to the compiler in effect. Obviously, the users should be given more control on the code scope profiled. Hence, a dedicated function, namely CProfiler::StopInstance() is provided to stop profiling.
The time elapsed in milliseconds between the start and stop of the profiling is the all-important profiler data. I use Windows QueryPerformanceFrequency() API function to get the current performance-counter frequency in counts per second. If the hardware doesn't support it, the CProfiler class does nothing. Windows QueryPerformanceCounter() API function is called twice; when the profiling is started and when stooped. This function retrieves the current value of the high-resolution performance counter. The deltas of the stop and start values are stored to the temporary files. Converting the delta values to milliseconds is left to the tool to improve the performance of the CProfiler.
The PROFILER_START macro takes a string id for the profiling point and starts the profiling. The function names can be used as the ids. The PROFILER_STOP macro dumps the id and the delta of counts like "CFoo::Foo,961486" to the temporary file for its thread. If the profiling is not stopped, the data dumped will be "CFoo::Foo,".
When the CProfiler::ProcessData() is executed, a final CSV format file like "profiling_2006.04.26_11.26.53.csv" is opened and the frequency of high performance counter is recorded like "Frequency,3579545". The file name format is "profiler_year.month.day.hour.min.seconds.csv". Alternatively, you can first call the CProfiler::StopClass() to stop all the profiling operations and later call the CProfiler::ProcessData(). Note that the tool needs the frequency to convert the delta values to milliseconds. Then, all the temporary files are accessed one-by-one to copy the data and then deleted. Note that when the class is stopped, it no longer generates the profiling data. The intention is to run the application under profiler several times and then process all the final CSV files via the ProcessProfilerData.hta tool. Running an application once under profiler doesn't give much data!
The data dumped that have no delta values generate warnings by the tool and they are ignored. Although this kind of records give no useful profiling data, they are still important: Assuming that the user expects the code execution reaches the PROFILER_STOP macros, they show that the execution somehow jumps out of the expected paths. I think they are particularly important when you use the CProfiler class as a probing tool to understand some new code rather than profiling some well-known code.
I decided to develop a script in VBScript to process the final CSV files. To provide a user-friendly GUI for the script, I embedded the VBScript code in a Dynamic HTML file. While the HTML is very good at creating GUI's easily, I hated the unnecessary security warnings issued when I run the tool. I recalled that Microsoft introduced the HMTL Applications with the Internet Explorer 5 and later. The HMTL Applications run as trusted applications and as such are not subject to the same security constraints as Web pages. Moreover, the HTA's have read/write access to the files on the client machine. It is very easy to convert an HTML file to an HTA file. The HTA:APPLICATION tag and attributes tell the window how to behave as an application. This tag must appear within the paired HEAD tags; see Listing Five. The HTA files require Microsoft Internet Explorer. Note that the tool works very slowly when there are a lot of data, say 50K!
<head> <title>Process Profiler Data</title> <hta:application id="PPD" border="thin" borderstyle="normal" icon="" maximizebutton="yes" minimizebutton="yes" singleinstance="yes" sysmenu="yes" version="1.0" windowstate="maximize" navigable="yes" /> </head>
The ProcessProfilingData.hta tool contains VBScript code embedded into an HTML user interface; see Figure 1. Note that the HTA extension means it is an HTML Application. The GUI works top-to-down. Initially all the steps marked red and only the first step is enabled.
At this step the user is expected to specify a Microsoft Excel file which the tool will create. By default the file name starts with "profiling_". After the user completes the first step, it is marked green and the second step is enabled. Similarly the third step is enabled when the second step is completed and marked green. At this point all the user needs to do is pressing the "Process Data" button. Then, the tool processes all the CSV files found in the specified directory, converts the delta values to milliseconds, draws two charts and marks the third step green. One chart is on the call cost in milliseconds; see Figure 2.
The other chart is on the hit count; see Figure 3. Besides creating a Microsoft Excel file, the tool exports the charts as gif files, too. So, the users may easily use the charts in any documentation.
I needed a memory checking tool while working for a complex project which was heavily using the memory operators. Initially, I tried to overload the memory operators to check them. However, I had run into compilation problems. It occurred to me to leverage my CProfiler class for this purpose. So, I updated the class; the second parameter of the CProfiler::InitClass() specifies the usage. If it is true, the class will be used for profiling the application, else it will be used to check the memory operators and I overloaded the constructor. Then, I defined several macros; see Listing Six.
#define MEMORY_NEW(ptr,ptrtype,id) CProfiler mem_new(ptr, L##ptrtype, L"new", L##id) #define MEMORY_NEW_ARR(ptr,ptrtype,id) CProfiler mem_new_arr(ptr, L##ptrtype, L"new[]", L##id) #define MEMORY_DELETE(ptr,ptrtype,id) CProfiler mem_delete(ptr, L##ptrtype, L"delete", L##id) #define MEMORY_DELETE_ARR(ptr,ptrtype,id) CProfiler mem_delete_arr(ptr, L##ptrtype, L"delete[]", L##id)
At the site of each new, delete, new[], and delete[] operator calls, the appropriate macro should be used; see Listing Seven.
int main(int argc, char* argv[]) { PROFILER_INITCLASS_CURRDIR; int* pInt = new int; char* pChar = new char[100]; MEMORY_NEW(pInt, "int", "main"); // respectively Address,Type,Id MEMORY_NEW_ARR(pChar, "char", "main"); delete pInt; delete [] pChar; MEMORY_DELETE(pInt, "int", "main"); MEMORY_DELETE_ARR(pChar, "char", "main"); PROFILER_PROCESSDATA; return 0; }
Certainly, adding them is a very boring task! That's why I created another tool, ProcessMemoryData.hta, to process the memory data thus collected. This tool imports the selected CSV file, creates an XLS file in its folder named after its name, sorts all the data and gives a report on the mismatching new/delete and new[]/delete[] uses for each address and type. Its user interface is similar to the ProcessProfilingData.hta.
Users are expected to run the profiled applications many times under different conditions. Here, the conditions depend on the computer on which the application runs: operating system, number of processes running, network connections, available physical and virtual memories, CPU power, and so on. Perhaps it is a good idea to define a "typical" condition and specifically analyze the application under this condition. For consistent results, the users should stop other applications that execute at random intervals. Moreover, the users should profile only the specific areas of interest. For example, it may be meaningless to profile the GUI part of the application. The complex algorithms are the prime targets for profiling. The user may discover opportunities to improve them by studying the profiler data carefully. It is at this point that you'll likely find the CProfiler class most useful.
Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.