Profiling Cygwin Programs

Introduction

Profiling is a way to analyze your program to find out where it is spending its time. You might need to do this if it seems your program is taking more time to do its job than you think it should. It is always preferable to profile your program than to just guess where the time is being spent; even expert programmers are known to guess badly at this.

In Cygwin, you enable profiling with a compiler flag and you display the resulting profiling data with gprof. Read on to find out how.

To enable profiling of your program, first compile it with an additional gcc flag: -pg. That flag should be used when compiling every source file of the program. If your program has a Makefile, you would add the flag to all gcc compilation commands or to the CFLAGS= setting. A manual compilation that enables profiling looks like this:

bash$ gcc -pg -g -o myapp myapp.c

The -pg flag causes gcc to do two additional things as it compiles your program. First, a small bit of code is added to the beginning of each function that records its address and the address it was called from at run time. gprof uses this data to generate a call graph. Second, gcc arranges to have a special "front end" added to the beginning of your program. The front end starts a recurring timer and every time the timer fires, 100 times per second, the currently executing address is saved. gprof uses this data to generate a "flat profile" showing where your program is spending its time.

After compiling your program (and linking it, if you do that as a separate step), you are ready to profile it. Just run it as you normally would. If there are specific code paths you want to profile, take the actions that would exercise those code paths. When your program exits, you will have an additional file in the current directory: gmon.out. That file contains the profiling data gprof processes and displays.

gprof has many flags to control its operation. The [gprof man page] details everything gprof can do. We will only use a few of gprof's flags here. You launch gprof as follows:

bash$ gprof [flags] appname [datafile]...

If you don't specify any flags, gprof operates as if you gave it flags -p -q which means: generate a flat profile with descriptive text and generate a call graph with more descriptive text. In the examples below we will give specific flags to gprof to demonstrate specific displays. We'll also use flag -b which means: be brief, i.e. don't display the descriptive text. You can also specify a trailing list of one or more profiling data files. If you don't, gprof assumes gmon.out is the only file to process and display.

Examples

Example 4.7. Flat profile

bash$ gprof -b -p myapp
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 25.11     13.34    13.34        1    13.34    13.34  func0
 25.00     26.62    13.28        1    13.28    13.28  func1
 25.00     39.90    13.28        1    13.28    13.28  func3
 24.89     53.12    13.22        1    13.22    13.22  func2
 

Example 4.8. Call graph

bash$ gprof -b -q myapp
                        Call graph


granularity: each sample hit covers 4 byte(s) for 0.02% of 53.12 seconds

index % time    self  children    called     name
                                                 <spontaneous>
[1]    100.0    0.00   53.12                 main [1]
               13.34    0.00       1/1           func0 [2]
               13.28    0.00       1/1           func1 [3]
               13.28    0.00       1/1           func3 [4]
               13.22    0.00       1/1           func2 [5]
-----------------------------------------------
               13.34    0.00       1/1           main [1]
[2]     25.1   13.34    0.00       1         func0 [2]
-----------------------------------------------
               13.28    0.00       1/1           main [1]
[3]     25.0   13.28    0.00       1         func1 [3]
-----------------------------------------------
               13.28    0.00       1/1           main [1]
[4]     25.0   13.28    0.00       1         func3 [4]
-----------------------------------------------
               13.22    0.00       1/1           main [1]
[5]     24.9   13.22    0.00       1         func2 [5]
-----------------------------------------------


Index by function name

   [2] func0                   [5] func2
   [3] func1                   [4] func3
 

Example 4.9. Source line profile

bash$ gprof -b -l myapp
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 25.11     13.34    13.34        1    13.34    13.34  func0 (myapp.c:9 @ 1004010e0)
 25.00     26.62    13.28        1    13.28    13.28  func1 (myapp.c:10 @ 10040111a)
 25.00     39.90    13.28        1    13.28    13.28  func3 (myapp.c:12 @ 10040118e)
 24.89     53.12    13.22        1    13.22    13.22  func2 (myapp.c:11 @ 100401154)


                        Call graph


granularity: each sample hit covers 4 byte(s) for 0.02% of 53.12 seconds

index % time    self  children    called     name
               13.34    0.00       1/1           main (myapp.c:26 @ 10040123c) [1]
[2]     25.1   13.34    0.00       1         func0 (myapp.c:9 @ 1004010e0) [2]
-----------------------------------------------
               13.28    0.00       1/1           main (myapp.c:26 @ 10040123c) [1]
[3]     25.0   13.28    0.00       1         func1 (myapp.c:10 @ 10040111a) [3]
-----------------------------------------------
               13.28    0.00       1/1           main (myapp.c:28 @ 100401246) [5]
[4]     25.0   13.28    0.00       1         func3 (myapp.c:12 @ 10040118e) [4]
-----------------------------------------------
               13.22    0.00       1/1           main (myapp.c:27 @ 100401241) [7]
[6]     24.9   13.22    0.00       1         func2 (myapp.c:11 @ 100401154) [6]
-----------------------------------------------


Index by function name

   [2] func0 (myapp.c:9 @ 1004010e0) [6] func2 (myapp.c:11 @ 100401154)
   [3] func1 (myapp.c:10 @ 10040111a) [4] func3 (myapp.c:12 @ 10040118e)
 

Special situations

Profiling multi-threaded programs

Multi-threaded programs are profiled just like single-threaded programs. There is no mechanism to turn profiling on or off for specific threads. gprof combines the data for all threads when generating its displays.

Profiling programs that fork

Programs that fork, i.e., use the fork() system call with or without using exec() afterwards, require special care. Since there is only one gmon.out file, profiling data from the parent process might get overwritten by the child process, or vice-versa, after a fork(). You can avoid this by setting the environment variable GMON_OUT_PREFIX before running your program. If the variable is non-empty, its contents will be used as a prefix to name the profiling data files. Here's an example:

Example 4.10. 

bash$ export GMON_OUT_PREFIX=myapp.out
bash$ ./myapp -fork
bash$ ls myapp.out*
myapp.out.2728  myapp.out.3224

bash$ gprof -bp myapp myapp.out.2728
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 50.25     30.28    30.28        2    15.14    15.14  func3
 24.99     45.34    15.06        1    15.06    15.06  func1
 24.76     60.26    14.92        1    14.92    14.92  func2

bash$ gprof -bp myapp myapp.out.3224
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 49.25     29.36    29.36        2    14.68    14.68  func3
 25.43     44.52    15.16        1    15.16    15.16  func1
 25.33     59.62    15.10        1    15.10    15.10  func2

bash$ gprof -bp myapp myapp.out*
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 49.75     59.64    59.64        4    14.91    14.91  func3
 25.21     89.86    30.22        2    15.11    15.11  func1
 25.04    119.88    30.02        2    15.01    15.01  func2
 

As the last gprof command above shows, gprof can combine the data from a selection of profiling data files to generate its displays. Just list the names of those files at the end of the gprof command; you can use a wildcard here. NOTE: If you update your program, remember to remove stale profiling data files before profiling your program again. If you aren't careful about this, gprof could combine data from your most recent version with stale data from prior versions, possibly giving misleading displays.

Getting better profiling resolution

To get better resolution (i.e., more data points) when profiling your program, try running it multiple times with the environment variable GMON_OUT_PREFIX set, as described in the previous situation. There will be multiple profiling data files generated and you can have gprof combine the data from all of them into one display.

Profiling programs with their libraries

At the time of this writing Cygwin's profiling support only allows for one range of addresses per program. It is hard-wired to be the range covering the .text segment of your program, which is where your code resides. If you build your program with static libraries (e.g., libfoo.a), the code from those libraries is linked into your program's .text segment so will be included when profiling. But dynamic libraries (e.g., libfoo.dll) reside in other address ranges and code within them won't be included.

Profiling Cygwin itself

Due to the issue mentioned in the previous situation and other issues, at the time of this writing there is no support for profiling Cygwin itself.