Switch to SpeakEasy.net DSL

The Modular Manual Browser

Home Page
Manual: (OSF1-V5.1-alpha)
Apropos / Subsearch:
optional field

uprofile(1)							  uprofile(1)


  uprofile, kprofile - Profile a program (uprofile) or kernel (kprofile) with
  Alpha	on-chip	performance counters


  uprofile [-v]	[-quiet] [-dirname path] [-[no]pids] [-all  | -each  | -one]
  [-stride n] [-average] [-pixie] [-display  | prof-option...] [statistic...]
  program [argument...]

  kprofile [-v]	[-quiet] [-dirname path] [-[no]pids] [-all  | -each  | -one]
  [-stride n] [-average] [-pixie] [-display  | prof-option...]
  [-k kernel_name] [-t]	[-ra] [statistic...] [program [argument...]]


  See prof_intro(1) for	an introduction	to the application performance tuning
  tools	provided with Tru64 UNIX.

  The uprofile command uses the	Alpha on-chip performance counters to produce
  a finely-grained program-counter profile of a	user program. The command
  runs the program you specify with the	arguments you specify, collecting the
  selected statistics on the program's process and its descendants. It writes
  the profile data to the umon.out file, by default. If	the program calls
  shared libraries, those libraries are	not profiled.

  The kprofile command uses the	Alpha on-chip performance counters to produce
  a detailed program-counter profile of	the kernel. If you specify a program,
  kprofile runs	the program with the arguments you specify, and	it collects
  the selected statistics on the kernel	for the	duration of the	program's
  execution. If	you do not specify a program, kprofile collects	the selected
  statistics on	the kernel until you enter Ctrl/C or the kprofile process
  receives a SIGTERM signal. Note that if SIGINT (usually generated by enter-
  ing a	Ctrl/C at the controlling terminal) is currently being ignored,	it
  will continue	to be ignored and SIGTERM must be used to terminate data col-
  lection.  kprofile writes the	profile	data to	the kmon.out file, by

  If you specify -display or any of the	prof-options, the uprofile and kpro-
  file commands	display	the profile by runnning	the prof tool (with any
  specified prof-options).

  You can also run the prof command separately,	to help	analyze	the data in
  the umon.out or kmon.out file. The following examples	show how to invoke
  the prof command to analyze data in the respective files:

       % prof a.out umon.out
       % prof /vmunix kmon.out

  The CPU-time profile displayed by prof will not be accurate if the CPU
  speed	of the processors that executed	the application	are not	the same, as
  in certain multiprocessor systems containing EV67 or later processors. The
  inaccuracy may be avoided by using the hiprof	(sampling) or cc -p/-pg	pro-
  filers, or by	running	the application	on a subset of the processors:

    +  Select a	single processor using the runon command.

    +  Check the processor speeds using	the psrinfo -v command and run the
       application in a	processor set comprising only processors that run at
       the same	speed (see processor_sets(4))


      The name of an event that	your particular	Alpha hardware can profile,
      as detailed in the STATISTICS section, below. If no statistic is named,
      machine cycles are counted, giving a CPU-time profile. One statistic
      can be specified for each	of the hardware	counters on your machine.

      The name of the executable to run	while profiling	operations are being

      An argument to pass to the program that is run. Multiple arguments can
      be specified, as needed by the program.


  Options can be abbreviated to	three characters, except the prof-options,
  which	can be abbreviated (usually to one character) as in a prof command.
  For example, -qui is interpreted as quiet, but -q is interpreted as -quit.
  (See the -display option for the supported prof-options.)

  For options that specify a procedure name (proc), C++	procedures can omit
  the argument type list, though this will match all overloaded	procedures
  with that name. To select a specific procedure, specify the full symbol
  name (as printed by the nm command). Symbol names containing spaces, *, and
  so on	must be	quoted.

  -v  Engages verbose mode, which prints some useful information about the
      program being profiled.

      Prevents informational and progress messages from	being printed.

  -dirname path
      Specifies	the directory path in which the	profiling data file or files
      are created.

      [Disables] or enables the	addition of the	process-id number to the name
      of the profiling data file or files.

      Specifies	which mode to use for profiling	on multiprocessor machines.
      Using the	-all option (the default) aggregates the data for all CPUs
      into one umon.out	file. Using the	-each option collects separate pro-
      files for	each CPU and writes the	output into a set of files named
      umon.out.n, where	n is the CPU number. Using the -one option profiles
      only the current CPU. For	the -one option	to work, the uprofile or
      kprofile program must be run using the runon command.

  -stride n
      Sets the granularity of the sample counts, where n is the	number of
      consecutive instructions grouped together	for each sample	count. The
      default is -stride 4. The	-asm, -heavy, and -lines prof-options need a
      separate sample count for	each instruction (for their reports to be
      precise enough), so these	options	imply -stride 1.  This makes the out-
      put file four times bigger than the default size.	The -stride argument
      must be a	power of two (for example, 1, 2, 4, 8).

      Attempts to average samples within basic blocks so that each
      instruction within a basic block will show the same number of samples.
      Ensures fine grain profiles by setting stride to 1.

      Produces .Addrs and .Counts files	similar	to those produced by running
      an executable instrumented with pixie (see pixie(1)).  Uses cycles0
      statistic	(freq on EV67) by default. Ensures fine	grain profiles by
      setting stride to	1.

  -k kernel_name
      Overrides	the name of the	kernel to profile. (The	default	is the booted

  -t  Enables triggered	mode for kprofile. This	option sets up all required
      information for running the performance counters,	but does not invoke
      them. See	the STATISTICS section for additional information.

  -ra Enables PCNTCALLER mode for kprofile. Collects profiling data on the
      caller of	certain	kernel utility routines	(for example, bcopy, bzero,
      simple_lock), instead of the routine itself.

      Runs prof	on the resulting profile data file(s). The following prof
      options are supported:

	  Reports the profile as an annotated disassembly.

      -exclude proc
	  Excludes procedure proc from the profile but includes	its CPU	time
	  or other statistic in	the total.

      -Exclude proc
	  Excludes procedure proc from the profile and from the	total.

	  Profiles source lines, printing those	with the highest CPU time or
	  other	statistic first.

	  Reports the profile per source line within each procedure.

      -merge file
	  Merges all profile data files	into file.

	  Prints each procedure's starting line	number.

      -only proc
	  Includes only	procedure proc in the profile, but totals all pro-

      -Only proc
	  Includes only	procedure proc in the profile and in the total.

	  Profiles procedures, printing	those with the highest CPU time	or
	  other	statistic first.

      -quit n [[cum]%]
	  Truncates the	reports	after n	lines or after (cumulative) n percent
	  of the whole.


  You specify the statistics that you want to collect for the program being
  profiled in one or more statistic operands.

  If you specify multiple statistics, uprofile and kprofile accumulate their
  results. You cannot then view	the results of any single statistic
  separately. Because collected	data is	merged into a single buffer,
  interpretation of multiply collected statistics may be difficult.

  The Alpha architecture implemented on	your machine determines	which statis-
  tics can be collected	and the	number of counters available for collecting
  multiple statistics at the same time.	The implementation is indicated	by
  the Alpha chip number, which can be displayed	with the show config console
  command before booting Tru64 UNIX, or, after booting,	by using the psrinfo
  -v command, or by calling getsysinfo (GSI_PROC_TYPE).	Also, if the uprofile
  command is run without arguments, it will show how many counters and what
  statistics are available on your machine.

  All of the chips in the EV4 family (21064 [EV4], 21064A [EV45], 21066/21068
  [LCA4]) have two performance counter registers, each of which	can be
  separately programmed. The statistics	that each counter can collect are
  shown	in the following table:

  Counter0Stats	  Counter1Stats
  0disabled	  1disabled
  issues	  dcache
  pipedry	  icache
  loads		  dualissues
  pipefrozen	  mispredicts
  branches	  floatops
  cycles	  intops
  PALcycles	  stores
  nonissues	  novictims

  All of the chips in the EV5 family (21164 [EV5], 21164A [EV56], and 21164PC
  [PCA56]) have	three performance counter registers, each of which can be
  separately programmed. Some of the counters are common to all	EV5 implemen-
  tations, some	are specific to	EV5 and	EV56, and some are specific to PCA56.

  The statistics that each of the common EV5 counters can collect are shown
  in the following table:

  Counter0Stats	  Counter1Stats	  Counter2Stats
  0disabled	  1disabled	  2disabled
  cycles0	  nonissues	  longstalls
  issues	  splitissue	  pcmispredicts
		  pipedry	  branchmispredicts
		  replay	  icachemisses
		  singleissues	  itbmisses
		  dualissues	  dcacheldmisses
		  tripleissues	  dtbmisses
		  quadissues	  ldsmerged
		  flowchanges	  ldureplays
		  intops	  fullreplays
		  floatops	  externalinput
		  loads		  cycles2
		  stores	  memorybarriers
		  icacheacc	  lockedloads

  The statistics that each of the EV5- and EV56-specific counters can collect
  are shown in the following table:

  Counter1Stats	  Counter2Stats
  scacheacc	  scachemisses
  scachereads	  scachereadmisses
  scachewrites1	  scachewritemisses
  scachevictim	  scachesharedwrites
  bcacheref	  scachewrites2
  bcachevictim	  bcachemisses
  sysreqs	  systeminvalidates

  The statistics that each of the PCA56-specific counters can collect are
  shown	in the following table:

  Counter1Stats		 Counter2Stats
  bcachereads		 bcachedreads
  bcachedreadhits	 bcachereadhits
  bcachedreadfills	 bcachereadfills
  bcachewrites		 bcachewritehits
  bcachecleanwritehits	 bcachewritefills
  bcachevictims		 sysreadflushhits
  readmisstwo		 sysreadflushmisses

  The EV6 chip has two performance counter registers, each of which can	be
  separately programmed. The statistics	that each of the EV6-specific
  counters can collect are shown in the	following table:

  Counter0Stats	  Counter1Stats
  0disabled	  1disabled
  cycles0	  cycles1
  retinst	  retcondbranch

  The default is to gather cycle statistics in the 0th counter and to disable
  other	counters.

  The EV67 chip	has two	kinds of performance counters: traditional aggregate
  counters and profile-me counters. The	traditional aggregate statistics that
  each of the EV67-specific counters can collect are shown in the following
  table. Any one statistic or statistic	combination may	be selected.

  Counter0Stats	  Counter1Stats
  0disabled	  1disabled
  cycles0	  replay
  retinst	  cycles1
  retinst	  bcachemisses

  If no	aggregate statistics are selected, one profile-me statistic may	be

  Profile-me Statistics
  2disabled		abort		     abort_per_ret    arith_trap
  cbr_taken		cbr_taken_per_ret    cycles	      cycles_per_ret
  delay			delay_per_ret	     dstream_fault    dtb_miss
  dtb_miss_per_ret	dtb_miss3	     dtb_miss4	      early_kill
  early_kill_per_ret	fp_disabled	     freq	      icache_miss
			icache_parity	     inflt_bcache     inflt_replays

  inflt_retires		interrupt	     istream_accvio   itb_miss
  ldst_order		ldst_unalign	     map_stall	      map_stall_per_ret
  mispredict				     opcdec	      replay_trap

			retire		     trap	      trap_per_ret


  The default is to gather cycle statistics in the 0th counter and to disable
  other	counters.

  For descriptions of the statistics for all EV4, EV5, and EV6 implementa-
  tions, refer to pfm(7).

  You can disable any counter by specifying 0disabled, 1disabled, or 2dis-
  abled	as the counter statistic.  You can use this feature to isolate
  specific event types,	such as	loads, without extraneous data being gen-
  erated. You cannot disable all counters at the same time, choose two
  statistics for the same counter, or disable a	counter	once its statistic is

  When you specify no counter statistics, uprofile and kprofile	count cycles
  on counter 0 by default, and display (through	prof) a	profile	in terms of
  seconds used by each procedure in the	program, except	for any	shared

  For noncycle statistics, the displayed profile shows the number of samples
  recorded, the	sampling interval (events per second), and the total number
  of events that this implies. Most noncycle statistics	of the EV5 family
  CPUs are recorded about six cycles after the instruction that	triggered the
  sample.  So, when using prof's -asm or -lines	option,	the samples should be
  associated with one of the previously	executed few instructions of lines.
  The icacheacc, icachemisses, and dtbmisses statistics	are usually attri-
  buted	precisely.

  To perform a detailed	analysis of short sections of kernel code, use the
  kprofile command with	triggered mode (invoked	with the -t option). When you
  use this mode, kprofile performs all of the required setup for enabling the
  counters as normal, but does not invoke them.	You can	insert counter start
  or stop commands into	the kernel code	to be instrumented as follows:

       Turn counters on:  wrperfmon (PFOPT, 1)
       Turn counters off: wrperfmon (0)

  You can turn the counters on and off repeatedly to collect data over many
  iterations or	multiple sections of code.

  The macro PFOPT is defined in	<&lt;sys/pfcntr.h>&gt;.


  The interrupt	load that profiling places on the system may affect perfor-
  mance, but usually the effect	is insignificant.

  The kernel in	use must have the pfm pseudo-device configured into it.	To do
  this,	use one	of the following methods:

    +  Add the following line to the kernel configuration file,	and rebuild
       the kernel. Do not use this method if CPU hot-swap is supported by the
       system, because it does not allow pfm to	be easily unconfigured,	as
       required	for a hot-swap;	instead, use the sysconfig method below.
	    pseudo-device	pfm

    +  Enter the following command from	the root account. Do not configure
       pfm if CPU hot-swap is anticipated.
	    # sysconfig	-c pfm

       If pfm is configured, the CPU hot-swap procedure	requires that it be
       unconfigured, using the following command, before any CPU is swapped:
	    # sysconfig	-u pfm

       The autosysconfig program can be	used to	automatically load the confi-
       gurable pfm device at each system startup.

  The format of	the data files produced	by uprofile in Tru64 UNIX is dif-
  ferent from the format produced in versions of DIGITAL UNIX prior to Ver-
  sion 4.0. The	Tru64 UNIX data	files include the names	of selected statis-
  tics in profile displays. To convert these data files	to the industry-
  standard format, at the expense of losing the	names of the statistics, use
  the pdtostd command.


  The EV4 victim and novictim statistics rely on the external performance
  counter pin connections as described in the EV4 chip specification. The DEC
  3000/400, /500, /600,	and /800 workstations have these connections.
  Attempts to display either of	these statistics on other platforms (while
  allowed) will	typically generate empty data.

  The uprofile command is only supported on EV4	Pass 3 or later	processors.
  Attempts to use it on	a Pass 2 processor will	gather PC samples for every
  process running on the system.

  Using	kprofile to generate statistics	for a single command is	only possible
  on EV4 Pass 3	or later processors. Attempts to do this on a Pass 2 proces-
  sor will gather statistics for the entire system, as if no command had been

  Using	kprofile with triggered	mode also requires an EV4 Pass 3 or later
  processor and	cannot be performed with per-process monitoring.

  Only one tool	can use	the performance	counters at a time. A message similar
  to "the counter device is busy" indicates that some other tool is using the
  performance counters (or has used them but not cleaned up properly). If you
  are sure no one else is using	the performance	counters, running
  uprofile/kprofile with superuser privilege will attempt to reset the busy
  status and proceed.


      The performance counter device file.

      The statistics file(s) generated by uprofile.

      The statistics file(s) generated by kprofile.

      The statistics file(s) generated with the	-pids option.

      The default kernel to profile.


  Introduction:	prof_intro(1)

  pdtostd(1), pfm(7), prof(1), runon(1), psrinfo(1), sysconfig(8), autosys-
  config(8), processor_sets(4)

  Programmer's Guide