hiprof - CPU-time and page-fault call-graph profiler for performance
hiprof [-cycles | -faults | -pthread | -threads] [hiprof-option...]
[gprof-option...] program [argument...]
See the start of the OPTIONS section below for details of hiprof options
that may be essential for the correct execution of the program.
The atom -tool hiprof interface is still available, for compatibility with
earlier releases. However, it is now undocumented, and it will be retired
in a future release.
See prof_intro(1) for an introduction to the application performance tuning
tools provided with Tru64 UNIX.
The hiprof command creates an instrumented version of a program
(program.hiprof) that produces call-graph and flat profiles of one of a
range of performance statistics:
+ The CPU time spent in each procedure (or optionally, each source line
or instruction), measured by sampling the program counter about every
millisecond (the default)
+ The CPU time spent in each procedure and procedure call, measured as
machine cycles, including the effects of any memory-access delays
(with the -cycles option)
+ The number of page faults suffered by each procedure and procedure
call (with the -faults option)
See the limitations of each performance statistic in the RESTRICTIONS sec-
If you specify program arguments (argument...) or -run, the instrumented
program is executed also.
If you specify -display or any gprof-option, the hiprof command runs the
instrumented program and then displays the profile by running the gprof
tool (with any specified gprof-option).
If you omit the program name, a usage message is printed.
The following example shows how to instrument, run, and display the profile
for a multi-threaded program:
cc *.c -pthread -L. -g1 -O2 -o program -lapp1 -lapp2
hiprof -pthread -L. -all program data/*
The -all option request that all shared libraries be profiled, but
threads-related system libraries cannot be safely instrumented to count
procedure calls that are needed to print a call graph. By default, these
libraries are still sampled to provide flat CPU-time profiles. The -cycles
and -faults options cannot be used with threaded programs, but the
displayed time or page-fault count for a procedure includes the time or
count for any procedures that it calls but that were not selected for
instrumentation--for example, any procedures in libraries not selected by
the -all or -incobj options. This means that time is not lost from these
profiles by excluding shared libraries.
File name of a fully linked call-shared or nonshared executable to be
profiled. This program should be compiled with the -g or -gn option
(n>=1) to obtain more complete profiling information. If the default
symbol table level (-g0) is used, line number information, static pro-
cedure names, and file names are unavailable. Inlined procedure calls
are also unavailable. Programs that are stripped or are optimized by
spike or cc -om are not supported.
All arguments following the program name are considered to be arguments
needed by the instrumented program to execute the procedures, lines,
and instructions of interest. Multiple arguments can be specified. They
imply -run if any are specified, and they can be replaced by -run if
none are needed.
Options can be abbreviated to three characters. The gprof-options, which
are provided as alternatives to the -display option, can be abbreviated to
For options that specify a procedure name (proc), C++ procedures can omit
the argument type list, though this will match all overloaded procedures
with that name. To select a specific procedure, specify the full symbol
name (as printed by the nm command). Symbol names containing spaces, aster-
isks, and so on must be quoted.
Some or all of these options may be needed to prevent the instrumented pro-
Specify -pthread if the program or any of its libraries calls
pthread_create(3) (for example, if it was compiled with either the
-pthread option or the -threads compatibility option). This will make
the collection of profile data thread-safe.
Specify -fork if the program calls any variant of fork(2). It is not
usually needed if the subprocesses also call any variant of exec(2).
The -fork option ensures that forked multi-threaded programs are pro-
filed in a thread-safe way, and it produces separate profiling data
files for the forked subprocesses, including the process id in their
file names as if -pids was specified. Failure to use -fork might lead
to deadlock in the forked child processes.
For compatibility with earlier releases, a default level of fork
support is provided if the executable is non-shared or if libc.so is
instrumented. However, this approach can lead to deadlock and will be
retired in a future release, so specifying -fork is recommended.
By default, the hiprof code running in the program's process allocates
memory for its own use at address 38000000000. If the program needs to
use memory between 38000000000 and 3ff00000000, specify the address
that the hiprof code should use.
Specify -sigdump to force the instrumented program to write the current
profile data to its file(s) on receipt of the named signal. By
default, the program writes the profiling data file(s) only when the
process terminates, but some processes never terminate normally, so
this option lets you generate the file(s) on demand. After a file is
written, the instruction counts of the profile are all set to zero; so
by sending two signals, any interval of a test run can be profiled,
with the second signal's file(s) overwriting the first. For example, to
use the default kill pid command to signal the program, specify -sig-
dump TERM. Choose a signal that the program does not use for another
Profiling Statistics Options
Profiles CPU time by counting the machine cycles used in each procedure
call. Use this option only for non-threaded programs.
Profiles page faults suffered by each procedure instead of the default
time spent in each procedure. Use this option only for non-threaded
File Generating Options
Does not print informational and progress messages on the standard
-v Prints the command lines used to instrument the program and to execute
the instrumented program. Prints the names of any procedures that were
Names the instrumented program file instead of the default
Specifies the directory to which the instrumented program writes the
profiling data file(s) for each test run. The default is the current
Adds the process-id of the instrumented program's test run to the name
of the profiling data file produced (that is, program.pid.hiout). By
default, the file is named program.hiout.
When profiling a threaded program, specify -threads to produce a
separate profile for each pthread in the program. The files are named
program[.pid].sequence.hiout, where sequence is the thread sequence
number assigned by pthread_create(3). The -threads option implies the
-pthread option.If -sigdump is needed, -pthread is recommended instead
of -threads, to avoid possible synchronization problems.
Shared-Library Profiling Options
Profiles all the shared libraries in addition to the program's execut-
If -all was specified, does not profile the shared library lib. Can be
repeated, to exclude multiple libraries.
Profiles the shared library lib. Can be repeated to include multiple
Searches for shared-libraries in the specified directory before search-
ing the default directories. Can be repeated to make a search path. Use
the same options that were used when linking the program with ld.
Does not instrument the procedure proc. This option can be used to
exclude procedures that are uninteresting or that interfere with the
instrumentation (such as non-standard assembly code).
Execution Control Options
Executes the instrumented program, even if no arguments are specified.
By default, the program is just instrumented for later execution.
Prints the tool's version number.
Executes the instrumented program, and runs gprof with default options
on the resulting .hiout file(s).
Executes the instrumented program, and runs gprof on the resulting
.hiout file(s). The following gprof options are supported:
Profiles each instruction within selected procedures.
Does not report on called procedures.
Excludes procedure proc and its descendants from the profile, but
totals all procedures.
Includes only procedure proc and its descendants in the profile,
but totals all procedures.
Profiles procedures as an indexed call graph (default).
Profiles source lines, listing the most heavily used first.
Profiles source lines, in order within selected procedures.
Merges all .hiout input files into file.
Prints each procedure's starting line number.
Profiles procedures, listing the most heavily used first (default).
Profiles the whole executable and any shared libraries.
Reports procedures that were never called.
If hiprof finds any previously instrumented shared libraries in the working
directory, it will reuse them if they meet current requirements, to reduce
Temporary instrumentation files are created in /tmp. Set the TMPDIR
environment variable to a different directory to create the files else-
where, for example in a disk partition with more space.
The default sampled profile only estimates the CPU time spent in each pro-
cedure call; profiles made with the -cycles and -faults options measure it.
When timing a program's procedures by measuring machine cycles (with the
-cycles option), the 32-bit cycle-counting hardware will wrap if no pro-
cedure call or return is executed by the program every few seconds -- for
example, because of a long-running loop. If the counter wraps, the profile
will be incorrect. Using the -all or -incobj options to profile all non-
system libraries and procedures can help avoid this restriction.
The -cycles option generates an inaccurate profile if the instrumented pro-
gram is run on a system whose processors have different cycle speeds. This
inaccuracy can be avoided by using hiprof's default sampling profiler or
the cc -p/-pg profilers instead, or by running the application on a subset
of the processors:
+ Select a single processor using the runon command.
+ Check the processor speeds using the psrinfo -v command and run the
application in a processor set comprising only processors that run at
the same speed (see processor_sets(4)).
Approximate performance estimates are as follows but will vary according to
the application and the machine's CPU count, type, and clock rate. The
hiprof instrumentation takes ~2s per Mb of program file on a 500-MHz EV6
(21264) Alpha system, using ~10 Mb of memory plus another ~10 Mb per Mb of
the largest file. The instrumented files are ~20% larger than the origi-
nals, plus ~1 Mb of hiprof code. They run ~4 times slower. By default, each
profile data file is at least the size of the instrumented code (and uses
this much memory), but these files are very small for the -cycles and
If a procedure contains interprocedural branches or interprocedural jumps,
that procedure will not be instrumented with the -cycles or -faults option,
and no information will be reported about that procedure. Use the -v option
to see which procedures were not instrumented. Compilers can optimize
return statements or non-returning function calls to interprocedural
branches. To avoid this, recompile with the -O0 or -no_inline option.
Instrumented version of program produced by hiprof
Profile data file produced by program.hiprof
Instrumented shared libraries produced by hiprof
Temporary file created and deleted in the current and -dirname path
atom(1), cc(1), dxprof(1), fork(2), gprof(1), kill(1), ld(1), pixie(1),
processor_sets(4), psrinfo(1), pthread(3), runon(1), uprofile(1). (dxprof
is available as an option.)