uprofile, kprofile - Profile a program (uprofile) or kernel (kprofile) with
Alpha on-chip performance counters
uprofile [-v] [-quiet] [-dirname path] [-[no]pids] [-all | -each | -one]
[-stride n] [-average] [-pixie] [-display | prof-option...] [statistic...]
kprofile [-v] [-quiet] [-dirname path] [-[no]pids] [-all | -each | -one]
[-stride n] [-average] [-pixie] [-display | prof-option...]
[-k kernel_name] [-t] [-ra] [statistic...] [program [argument...]]
See prof_intro(1) for an introduction to the application performance tuning
tools provided with Tru64 UNIX.
The uprofile command uses the Alpha on-chip performance counters to produce
a finely-grained program-counter profile of a user program. The command
runs the program you specify with the arguments you specify, collecting the
selected statistics on the program's process and its descendants. It writes
the profile data to the umon.out file, by default. If the program calls
shared libraries, those libraries are not profiled.
The kprofile command uses the Alpha on-chip performance counters to produce
a detailed program-counter profile of the kernel. If you specify a program,
kprofile runs the program with the arguments you specify, and it collects
the selected statistics on the kernel for the duration of the program's
execution. If you do not specify a program, kprofile collects the selected
statistics on the kernel until you enter Ctrl/C or the kprofile process
receives a SIGTERM signal. Note that if SIGINT (usually generated by enter-
ing a Ctrl/C at the controlling terminal) is currently being ignored, it
will continue to be ignored and SIGTERM must be used to terminate data col-
lection. kprofile writes the profile data to the kmon.out file, by
If you specify -display or any of the prof-options, the uprofile and kpro-
file commands display the profile by runnning the prof tool (with any
You can also run the prof command separately, to help analyze the data in
the umon.out or kmon.out file. The following examples show how to invoke
the prof command to analyze data in the respective files:
% prof a.out umon.out
% prof /vmunix kmon.out
The CPU-time profile displayed by prof will not be accurate if the CPU
speed of the processors that executed the application are not the same, as
in certain multiprocessor systems containing EV67 or later processors. The
inaccuracy may be avoided by using the hiprof (sampling) or cc -p/-pg pro-
filers, or by running the application on a subset of the processors:
+ Select a single processor using the runon command.
+ Check the processor speeds using the psrinfo -v command and run the
application in a processor set comprising only processors that run at
the same speed (see processor_sets(4))
The name of an event that your particular Alpha hardware can profile,
as detailed in the STATISTICS section, below. If no statistic is named,
machine cycles are counted, giving a CPU-time profile. One statistic
can be specified for each of the hardware counters on your machine.
The name of the executable to run while profiling operations are being
An argument to pass to the program that is run. Multiple arguments can
be specified, as needed by the program.
Options can be abbreviated to three characters, except the prof-options,
which can be abbreviated (usually to one character) as in a prof command.
For example, -qui is interpreted as quiet, but -q is interpreted as -quit.
(See the -display option for the supported prof-options.)
For options that specify a procedure name (proc), C++ procedures can omit
the argument type list, though this will match all overloaded procedures
with that name. To select a specific procedure, specify the full symbol
name (as printed by the nm command). Symbol names containing spaces, *, and
so on must be quoted.
-v Engages verbose mode, which prints some useful information about the
program being profiled.
Prevents informational and progress messages from being printed.
Specifies the directory path in which the profiling data file or files
[Disables] or enables the addition of the process-id number to the name
of the profiling data file or files.
Specifies which mode to use for profiling on multiprocessor machines.
Using the -all option (the default) aggregates the data for all CPUs
into one umon.out file. Using the -each option collects separate pro-
files for each CPU and writes the output into a set of files named
umon.out.n, where n is the CPU number. Using the -one option profiles
only the current CPU. For the -one option to work, the uprofile or
kprofile program must be run using the runon command.
Sets the granularity of the sample counts, where n is the number of
consecutive instructions grouped together for each sample count. The
default is -stride 4. The -asm, -heavy, and -lines prof-options need a
separate sample count for each instruction (for their reports to be
precise enough), so these options imply -stride 1. This makes the out-
put file four times bigger than the default size. The -stride argument
must be a power of two (for example, 1, 2, 4, 8).
Attempts to average samples within basic blocks so that each
instruction within a basic block will show the same number of samples.
Ensures fine grain profiles by setting stride to 1.
Produces .Addrs and .Counts files similar to those produced by running
an executable instrumented with pixie (see pixie(1)). Uses cycles0
statistic (freq on EV67) by default. Ensures fine grain profiles by
setting stride to 1.
Overrides the name of the kernel to profile. (The default is the booted
-t Enables triggered mode for kprofile. This option sets up all required
information for running the performance counters, but does not invoke
them. See the STATISTICS section for additional information.
-ra Enables PCNTCALLER mode for kprofile. Collects profiling data on the
caller of certain kernel utility routines (for example, bcopy, bzero,
simple_lock), instead of the routine itself.
Runs prof on the resulting profile data file(s). The following prof
options are supported:
Reports the profile as an annotated disassembly.
Excludes procedure proc from the profile but includes its CPU time
or other statistic in the total.
Excludes procedure proc from the profile and from the total.
Profiles source lines, printing those with the highest CPU time or
other statistic first.
Reports the profile per source line within each procedure.
Merges all profile data files into file.
Prints each procedure's starting line number.
Includes only procedure proc in the profile, but totals all pro-
Includes only procedure proc in the profile and in the total.
Profiles procedures, printing those with the highest CPU time or
other statistic first.
-quit n [[cum]%]
Truncates the reports after n lines or after (cumulative) n percent
of the whole.
You specify the statistics that you want to collect for the program being
profiled in one or more statistic operands.
If you specify multiple statistics, uprofile and kprofile accumulate their
results. You cannot then view the results of any single statistic
separately. Because collected data is merged into a single buffer,
interpretation of multiply collected statistics may be difficult.
The Alpha architecture implemented on your machine determines which statis-
tics can be collected and the number of counters available for collecting
multiple statistics at the same time. The implementation is indicated by
the Alpha chip number, which can be displayed with the show config console
command before booting Tru64 UNIX, or, after booting, by using the psrinfo
-v command, or by calling getsysinfo (GSI_PROC_TYPE). Also, if the uprofile
command is run without arguments, it will show how many counters and what
statistics are available on your machine.
All of the chips in the EV4 family (21064 [EV4], 21064A [EV45], 21066/21068
[LCA4]) have two performance counter registers, each of which can be
separately programmed. The statistics that each counter can collect are
shown in the following table:
All of the chips in the EV5 family (21164 [EV5], 21164A [EV56], and 21164PC
[PCA56]) have three performance counter registers, each of which can be
separately programmed. Some of the counters are common to all EV5 implemen-
tations, some are specific to EV5 and EV56, and some are specific to PCA56.
The statistics that each of the common EV5 counters can collect are shown
in the following table:
Counter0Stats Counter1Stats Counter2Stats
0disabled 1disabled 2disabled
cycles0 nonissues longstalls
issues splitissue pcmispredicts
The statistics that each of the EV5- and EV56-specific counters can collect
are shown in the following table:
The statistics that each of the PCA56-specific counters can collect are
shown in the following table:
The EV6 chip has two performance counter registers, each of which can be
separately programmed. The statistics that each of the EV6-specific
counters can collect are shown in the following table:
The default is to gather cycle statistics in the 0th counter and to disable
The EV67 chip has two kinds of performance counters: traditional aggregate
counters and profile-me counters. The traditional aggregate statistics that
each of the EV67-specific counters can collect are shown in the following
table. Any one statistic or statistic combination may be selected.
If no aggregate statistics are selected, one profile-me statistic may be
2disabled abort abort_per_ret arith_trap
cbr_taken cbr_taken_per_ret cycles cycles_per_ret
delay delay_per_ret dstream_fault dtb_miss
dtb_miss_per_ret dtb_miss3 dtb_miss4 early_kill
early_kill_per_ret fp_disabled freq icache_miss
icache_parity inflt_bcache inflt_replays
inflt_retires interrupt istream_accvio itb_miss
ldst_order ldst_unalign map_stall map_stall_per_ret
mispredict opcdec replay_trap
retire trap trap_per_ret
The default is to gather cycle statistics in the 0th counter and to disable
For descriptions of the statistics for all EV4, EV5, and EV6 implementa-
tions, refer to pfm(7).
You can disable any counter by specifying 0disabled, 1disabled, or 2dis-
abled as the counter statistic. You can use this feature to isolate
specific event types, such as loads, without extraneous data being gen-
erated. You cannot disable all counters at the same time, choose two
statistics for the same counter, or disable a counter once its statistic is
When you specify no counter statistics, uprofile and kprofile count cycles
on counter 0 by default, and display (through prof) a profile in terms of
seconds used by each procedure in the program, except for any shared
For noncycle statistics, the displayed profile shows the number of samples
recorded, the sampling interval (events per second), and the total number
of events that this implies. Most noncycle statistics of the EV5 family
CPUs are recorded about six cycles after the instruction that triggered the
sample. So, when using prof's -asm or -lines option, the samples should be
associated with one of the previously executed few instructions of lines.
The icacheacc, icachemisses, and dtbmisses statistics are usually attri-
To perform a detailed analysis of short sections of kernel code, use the
kprofile command with triggered mode (invoked with the -t option). When you
use this mode, kprofile performs all of the required setup for enabling the
counters as normal, but does not invoke them. You can insert counter start
or stop commands into the kernel code to be instrumented as follows:
Turn counters on: wrperfmon (PFOPT, 1)
Turn counters off: wrperfmon (0)
You can turn the counters on and off repeatedly to collect data over many
iterations or multiple sections of code.
The macro PFOPT is defined in <<sys/pfcntr.h>>.
The interrupt load that profiling places on the system may affect perfor-
mance, but usually the effect is insignificant.
The kernel in use must have the pfm pseudo-device configured into it. To do
this, use one of the following methods:
+ Add the following line to the kernel configuration file, and rebuild
the kernel. Do not use this method if CPU hot-swap is supported by the
system, because it does not allow pfm to be easily unconfigured, as
required for a hot-swap; instead, use the sysconfig method below.
+ Enter the following command from the root account. Do not configure
pfm if CPU hot-swap is anticipated.
# sysconfig -c pfm
If pfm is configured, the CPU hot-swap procedure requires that it be
unconfigured, using the following command, before any CPU is swapped:
# sysconfig -u pfm
The autosysconfig program can be used to automatically load the confi-
gurable pfm device at each system startup.
The format of the data files produced by uprofile in Tru64 UNIX is dif-
ferent from the format produced in versions of DIGITAL UNIX prior to Ver-
sion 4.0. The Tru64 UNIX data files include the names of selected statis-
tics in profile displays. To convert these data files to the industry-
standard format, at the expense of losing the names of the statistics, use
the pdtostd command.
The EV4 victim and novictim statistics rely on the external performance
counter pin connections as described in the EV4 chip specification. The DEC
3000/400, /500, /600, and /800 workstations have these connections.
Attempts to display either of these statistics on other platforms (while
allowed) will typically generate empty data.
The uprofile command is only supported on EV4 Pass 3 or later processors.
Attempts to use it on a Pass 2 processor will gather PC samples for every
process running on the system.
Using kprofile to generate statistics for a single command is only possible
on EV4 Pass 3 or later processors. Attempts to do this on a Pass 2 proces-
sor will gather statistics for the entire system, as if no command had been
Using kprofile with triggered mode also requires an EV4 Pass 3 or later
processor and cannot be performed with per-process monitoring.
Only one tool can use the performance counters at a time. A message similar
to "the counter device is busy" indicates that some other tool is using the
performance counters (or has used them but not cleaned up properly). If you
are sure no one else is using the performance counters, running
uprofile/kprofile with superuser privilege will attempt to reset the busy
status and proceed.
The performance counter device file.
The statistics file(s) generated by uprofile.
The statistics file(s) generated by kprofile.
The statistics file(s) generated with the -pids option.
The default kernel to profile.
pdtostd(1), pfm(7), prof(1), runon(1), psrinfo(1), sysconfig(8), autosys-