unixdev.net


Switch to SpeakEasy.net DSL

The Modular Manual Browser

Home Page
Manual: (OSF1-V5.1-alpha)
Page:
Section:
Apropos / Subsearch:
optional field



prof(1)								      prof(1)



NAME

  prof,	pixstats - Analyzes profile data

SYNOPSIS

  prof [options] [prog_name [PC-sampling_data_file]...]

  prof -pixie  [options] [prog_name [Addrs_file	 |  Counts_file]...]

  prof -pixstats  [options] [prog_name [Addrs_file  |  Counts_file]...]

  pixstats [options] [prog_name	[Addrs_file |  Counts_file]...]

OPERANDS

  prog_name
      Name of the program executable to	be profiled.  This program should be
      compiled with the	-g1, -g2, or -g3 option	to obtain more complete	pro-
      filing information.  If the default symbol table level (-g0) has been
      used, line number	information, static procedure names, and file names
      are unavailable to the profiling code.

  PC-sampling_data_file
      Name of a	profiling data file (default mon.out) produced by executing a
      program that has been linked with	the cc -p command.

  Counts_file
      Name of an instruction-counts file produced by executing a program that
      has been instrumented with pixie.	If no Counts_file or Addrs_file	is
      specified, prog_name.Counts is used if found in the current working
      directory.

  Addrs_file
      Name of an instruction-address file produced when	the executable or
      shared library object is instrumented with pixie.	By default, the	path
      of each object.Addrs file	will be	recorded in the	Counts_file, so	they
      do not need to be	specified. The order of	precedence for finding an
      Addrs_file is as follows:	Addrs_file path	specified on command line,
      current directory, directory of object specified in command line argu-
      ment, directory where pixie created it.

OPTIONS

  For each prof	option,	you need to type only enough of	the name to distin-
  guish	it from	the other options. If you do not specify any options, prof
  uses -procedures by default.	Always specify -pixie or -pixstats when	you
  process .Addrs and .Counts files.

  The prof command accepts the following options:

  -all
      Causes the profiles for all shared libraries (if any) described in the
      data file(s) to be displayed, in addition	to the profile for the exe-
      cutable.

  -asm
      Causes the profiler to print the assembly	instructions for each subrou-
      tine along with the cycle	counts for each	instruction. The subroutines
      are sorted from highest cycle count to lowest. The instructions for
      each subroutine are printed in order; they are not sorted	by cycle
      count.

      When used	without	the -pixie option for a	PC-sampling profile, the CPU
      time used	by each	instruction is presented in milliseconds.  (For	upro-
      file and kprofile, per-instruction sample	counts are also	provided for
      events other than	time.)

  -clock megahertz
      Alters the appropriate parts of the listing to reflect the clock speed
      of the CPU. By default, the cycle	time of	the processor on which pro-
      gram was run is used. (Use this option only with the -pixie option.)

  -disassemble
      Disassembles and shows the analyzed object code. (Use this option	only
      with the -pixstats option.)

  -dislimit f
      Limits the disassembly to	blocks with f% frequency. (Use this option
      only with	the -pixstats option.)

  -exclude procedure_name
      If you use one or	more -exclude options, the profiler omits the speci-
      fied procedure and its descendents from the listing.  If any option
      uses an uppercase	"E" (for "Exclude"), prof also omits that procedure
      from the base upon which it calculates percentages. To represent all of
      the variations of	an overloaded C++ function name, you can specify just
      the part of the name up to but not including the "(".

  -excobj object_file_name
      Causes the profile for the named executable or shared library not	to be
      printed.	You can	use this option	multiple times in a single prof	com-
      mand.

  -feedback filename
      Produces a file with information that the	compiler system	can use	to
      decide which parts of the	program	will benefit most from global optimi-
      zation and which parts will benefit most from in-line procedure substi-
      tution (requires basic-block counting). (Use this	option only with the
      -pixie option.)

      This option is for compilers whose -feedback option requires a feedback
      file (rather than	an executable file) and	that do	not support the	prof
      command's	-update	option.	 For compilers that support the	-update
      option, better results can be achieved using that	option instead of the
      (prof) -feedback option.

  -heavy
      Reports the most heavily used lines in descending	order of use.

  -incobj object_file_name
      Causes the profile for the named shared library to be printed, in	addi-
      tion to the profile for the executable. You can use this option multi-
      ple times	in a single prof command.

  -invocations
      For each procedure, reports how many times the procedure was invoked
      from each	of its possible	callers	(requires basic-block counting).  For
      this listing, the	-exclude and -only options apply to callees, but not
      to callers.  (Use	this option only with the -pixie option.)

  -Ldir
      Changes the library directory search order for shared object libraries
      so that prof looks for them in dir before	the library recorded in
      profile_file and the default library directories.	 You can specify
      multiple -Ldir switches to specify several directory names.

  -L  Changes the library directory search order for shared object libraries
      so that prof never looks for them	in the default library directories.
      Use this option when the default library directories should not be
      searched and only	the directories	specified by -Ldir are to be
      searched.

  -lines
      Gives the	lines in order of occurrence within procedures.	 The pro-
      cedures are sorted in descending order of	use.

  -merge filename
      Sums the sampling	data files (or,	in pixie mode, the .Counts files) and
      writes the result	into a new file	with the specified name. The -only
      and -exclude options have	no effect on the merged	data.

  -nocounts
      Uses 1 for each basic block count. (Use this option only with the	-pix-
      stats or -pixie option.)

  -numbers
      Prints each procedure's starting line number if source file information
      is available from	the object file.

  -only	procedure_name
      If you use one or	more -only options, the	profile	listing	includes only
      the named	procedures, rather than	the entire program. If any option
      uses an uppercase	"O" for	"Only,"	prof uses only the named procedures,
      rather than the entire program, as the base upon which it	calculates
      percentages. To represent	all of the variations of an overloaded C++
      function name, you can specify just the part of the name up to but not
      including	the "(".

  -pixie
      Selects pixie mode, as opposed to	sampling mode.

  -pixstats
      Selects generation of an alternative pixie-mode report for basic-block
      profiling	data, as previously produced by	the pixstats(1)	command. All
      options of the previous version of pixstats(1) are recognized, for com-
      patibility.

  -procedures
      Reports time spent per procedure (using data obtained from sampling or
      basic-block counting; the	listing	tells which one). For basic-block
      counting,	this option also reports the number of invocations per pro-
      cedure, including	the aggregated invocations of any alternate entry
      points.

  -quit	n
      Truncates	listings after n lines (if n is	an integer), after the first
      entry that represents less than n	percent	of the total (if n is fol-
      lowed immediately	by a "%" character), or	after enough entries have
      been printed to account for n percent of the total (if n is followed
      immediately by "cum%").  For example, "-quit 15" truncates each part of
      the listing after	15 lines of text, "-quit 15%" truncates	each part
      after the	first line that	represents less	than 15	percent	of the whole,
      and "-quit 15cum%" truncates each	part after the line that brought the
      cumulative percentage above 15 percent.

  -testcoverage
      Reports all lines	that never executed. (Use this option only with	the
      -pixie option.)

  -totals
      For -procedures and -invocations listings, prints	cumulative statistics
      for the entire object file instead of for	each procedure in the object.

  -truecycles [0,1,2]
      Generates	more analysis of a program to provide a	more accurate reading
      of cycles, instead of the	default	which assumes each instruction exe-
      cutes in one cycle. The higher the number	chosen from the	arguments,
      the more accurate	the reading, although the profiler will	run slower,
      and memory-access	delays are still not reflected.	This option has	lit-
      tle or no	effect on EV6 (21264) and later	Alpha systems. (Use this
      option only with the -pixie option.)

  -update
      Updates the program executable (prog_name) with profiling	information
      in the specified .Counts files, for use in future	cc -feedback
      prog_name	command(s). This option	requires that prog_name	have been
      compiled with the	-feedback prog_name option or updating will fail.
      This option will not generate a display unless another option forcing
      the display behavior is specified. (Use this option only with the
      -pixie option.)

  -version
      Prints the tool's	version	number.

  -zero
      Prints a list of procedures that were never invoked (requires basic-
      block counting). (Use this option	only with the -pixie option.)

DESCRIPTION

  The prof command analyzes one	or more	data files generated by	the
  compiler's execution-profiling system	and produces a listing.	The prof com-
  mand can also	combine	those data files or produce a feedback file that lets
  the optimizer	take into account the program's	run-time behavior during a
  subsequent compilation.  Profiling is	a three-step process:

   1.  Compile the program

   2.  Execute the program

   3.  Run prof	to analyze the data.

  The compiler system provides two kinds of profiling:

  PC-sampling
      Interrupts the program periodically, recording the value of the program
      counter.

  Basic-block counting
      Divides the program into blocks delimited	by labels, jump	instructions,
      and branch instructions. It counts the number of times each block	exe-
      cutes.

  The uprofile and kprofile tools provide a third kind of profiling, perfor-
  mance	counter	sampling. The Alpha architecture on-chip performance counters
  are used in performance counter sampling.

  The following	sections describe how to perform the various kinds of profil-
  ing.


  PC-Sampling Profiles


  To use PC-sampling, compile your program with	the -p option (strictly
  speaking, it is sufficient to	use this option	only when linking the
  program). Then, run the program containing the profiling startup routine
  that calls monstartup	to allocate extra memory to hold the profiling data.
  If the program terminates normally or	calls exit(2), it records the data in
  a file at the	end of execution.

  If your program uses shared libraries, note that only	its call-shared	por-
  tion is profiled in detail. Only the total time spent	in each	shared
  library is recorded. To individually profile all library routines a program
  uses,	build the program with the -non_shared switch (by default, the com-
  piler	produces a call-shared object unless -non_shared is explicitly speci-
  fied), or set	the PROFFLAGS environment variable as described	in the
  Environment Variables	section.

  After	running	your program, use prof to analyze the PC-sampling data file.
  For example:

       cc -c myprog.c
       cc -p -o	myprog myprog.o
       myprog			 (generates mon.out)
       prof myprog mon.out

  When you use prof for	PC-sampling, the program name defaults to a.out. The
  PC-sampling data file	name defaults to mon.out; if you specify more than
  one PC-sampling data file, prof reports the sum of the data.

  PC-Sampling Environment Variables


  You can use environment variables to change the default PC sampling and
  profile data collection behavior. The	variables are PROFDIR and PROFFLAGS.
  The general form for setting these variables is:

    +  For C shell: setenv varname "value"

    +  For Bourne shell: varname = "value"; export varname

    +  For Korn	shell: export varname =	value

  In the preceding example, varname can	be one of the following:

  PROFDIR
      This environment variable	causes PC-sampling data	files to be generated
      with unique file names in	a specified directory.

      You specify a directory path as the value	and your prof results are
      placed in	the file path/pid.progname where path is the pathname, pid is
      the process ID of	the executing program, and progname is the program
      name.

  PROFFLAGS
      This environment variable	can take any of	the following values:

      -threads
	  Causes a separate data file to be generated for each thread. The
	  name of the data file	takes the following form: pid.sid.progname.

	  The form of the filename resolves to pid as the process ID of	the
	  program, sid as the sequence number of the thread, and progname as
	  the name of the program being	profiled.

      -all
	  Causes the program to	fully profile all the permanently loaded
	  shared libraries, in addition	to the nonshared or call-shared	exe-
	  cutable.

      -incobj name
	  Causes the program to	profile	only the named executable or shared
	  library.

      -excobj name
	  Causes the program not to profile the	named executable or shared
	  library.

      -stride
	  Causes prof to change	the ratio of text segment stride size to PC-
	  sample counter buffer	size, that is, the number of instructions
	  that are counted together in a single	counter	word. The appropriate
	  ratio	involves a tradeoff of size versus precision.  Strides of 1,
	  2, 4,	and 8 are supported.  A	special	stride of 0 causes a single
	  PC-sample count to be	recorded for each text segment.

	  The default stride is	2 for the executable, and 0 for	each of	its
	  shared libraries. If -all or -incobj are specified, all selected
	  objects are profiled with the	same stride.

      -sigdump signal-name
	  Automatically	establishes monitor_signal(3) as the signal handler
	  for the named	signal,	and it causes monitor_signal(3)	to zero	the
	  profile after	it is written to a file. This allows a signal to be
	  sent several times without the successive profiles overlapping, if
	  the file is renamed. The asynchronous	nature of a signal may cause
	  small	variations in the profile. Unrecognized	signal-names are
	  ignored.  The	-threads option	is ignored if combined with -sigdump.

      -dirname directory
	  Specifies the	directory path in which	the profiling data file	or
	  files	are created.

      -[no]pids
	  [Disables] or	enables	the addition of	the process-id number to the
	  name of the profiling	data file or files.

  You can use the PROFDIR and PROFFLAGS	environment variables together.	For
  more information, see	the Programmer's Guide.

  Basic-Block Counting


  To use basic-block counting, compile your program without the	option -p.
  Use the pixie	program	to translate your program into a profiling version
  and generate a file (prog_name.Addrs)	containing block addresses. Then, run
  the pixie version of the program, which (assuming the	program	terminates
  normally or calls exit(2)) will generate	  a file (prog_name\.Counts)
  containing block counts.

  After	running	the pixie version of your program, use prof with the -pixie
  option to analyze the	.Addrs and .Counts files.  Notice that you must
  specify the name of your original program, not the name of the .pixie	ver-
  sion.	For example:

       cc -c myprog.c
       cc -o myprog myprog.o
       pixie myprog	       (generates myprog.Addrs and myprog.pixie)
       myprog.pixie			       (generates myprog.Counts)
       prof -pixie myprog myprog.Addrs myprog.Counts

  When you use prof with the -pixie option, the	.Addrs file name defaults to
  prog_name.Addrs,  and	the .Counts file name defaults to prog_name.Counts.
  Note that, when the .Counts file name	defaults to prog_name.Counts, prof
  does not attach any path prefix to prog_name,	and it looks for the .Counts
  file in the current working directory. If you	specify	more than one .Counts
  file,	prof reports the sum of	the data.

  For each shared library selected for profiling, the prof command searches
  for an .Addrs	file in	the following locations	if the	file location is not
  explicitly specified on the command line:

    +  Current directory

    +  Directory in which the object file is located if	the location of	the
       object file is explicitly specified on the command line

    +  Directory in which pixie	created	it, as recorded	in the .Counts file

  For each selected shared library, the	prof command searches for an object
  file in the following	locations:

    +  Directories specified in	-Ldir options

    +  Directory in which pixie	found it, as recorded in the .Addrs file, if
       the -L option is	specified

    +  Standard	library	search directories, as searched	by ld, if the -L
       option is not specified

  Basic-Block Statistics


  Use the -pixstats option to get an alternative profile.  All options of the
  previous version of the pixstats(1) command are recognized, for compatibil-
  ity.

  If a disassembly is requested, all basic blocks (or those whose execution
  count	exceeds	the -dislimit percentage of total instructions)	are disassem-
  bled,	in increasing address order. Each block	is labeled with	its procedure
  name and any offset from the start of	the procedure. For each	instruction,
  the relative estimated CPU cycle at which the	instruction executes is
  printed, plus	its source line, address, binary code, and assembly language.
  The total CPU	cycles used by one execution of	the block, the number of
  times	it was executed, and its percentage of all instructions	executed are
  printed at the end of	the block, following any line reporting	a non-zero
  delay	caused to a follow-on block.

  The main report begins with a	record of the command line. This is followed
  by a summary of the program's	behavior:

    +  Total CPU cycles	used by	the profiled objects, plus the equivalent
       number of seconds

    +  Total number of instructions executed

    +  Total delay caused by instructions executed in the preceding basic
       block

    +  Total integer and floating-point	no-op, arithmetic and logical, logi-
       cal, shift, load, store,	load and store,	load followed by load, load
       and store and fetch (data bus use), load	and store relative to the
       stack or	global pointers, floating-point, floating-point	compare, con-
       ditional	branch instructions executed (itemized). Also, total number
       of branch instructions executed whose target instruction	is another
       branch; and total number	of such	branches that are estimated to be
       taken, rather than executing the	next instruction in line.

    +  Total basic blocks, procedure calls, and	branches that skip a single
       instruction that	were executed.

  Next,	some ratios are	printed:

    +  Stores :	stores + loads

    +  Instructions : basic block

    +  Instructions : branches

    +  Backward	branches : branches

    +  CPU cycles : procedure calls

    +  Instructions : procedure	calls

    +  Integer no-ops :	integer	and floating-point no-ops

    +  Floating-point no-ops : integer and floating-point no-ops

    +  Floating-point pipeline interlocks : floating-point operators

  Next,	basic blocks are analyzed according to how many	instructions they
  contain. For each size, pixstats reports the execution count,	its precen-
  tage and cumulative percentage relative to both instructions and basic
  blocks, the number of	instructions contained in blocks of that size, the
  percentage and cumulative percentage of this relative	to all instructions,
  and the CPU-cycle cost per instruction of blocks of that size. Then, pix-
  stats	prints various averages	and quartiles of basic block size, plus	the
  largest basic	block execution	count encountered (to indicate the chance of
  integer overflow in the analysis).

  Next,	pixstats analyzes the number of	registers (integer and floating-
  point) that are saved	on procedure entry (and	restored on exit).  It prints
  the number of	procedure entries that save a given number of registers, and
  the percentage and cumulative	percentage of this relative to all procedure
  entries, all registers saved,	and all	instructions executed. Finally,	it
  prints some averages and ratios.

  The next two tables contain information on the sizes of executed pro-
  cedures' stack frames	and the	frequency of execution of each kind of
  instruction. Frame sizes are reported	in "bits"; for example,	6 bits means
  a 32-	to 48-byte stack frame.	The number, percentage,	and cumulative per-
  centage of executed calls to procedures with the given frame size is
  printed. Similarly, the execution count is printed for each machine
  instruction code, but	this table is ordered by decreasing usage.

  The next four	tables are similar. They provide information about the size
  of literals used by various categories of Alpha instructions:

    +  ADD,SUB,CMP instructions

    +  AND,BIC,BIS,XOR,CMOV instructions

    +  MUL instructions

    +  SHIFT,EXT,INS,MSK,ZAP instructions

  (Note	that a table may be omitted if there is	no use of literals in the
  program for the particular instruction category). For	each of	these tables
  the size of the literal is reported in bits (for example, 4 bits means the
  literal is greater than or equal to 8	and less than 16).


  The next six tables are similar.  They contain information on	the size of
  the memory displacement from a base register:

    +  LDA displacement	from 0 (used like a load immediate instruction)

    +  LDAH displacement from 0	(used like a load immediate high)

    +  Branch

    +  SP-based	load/store (load or store within a stack frame)

    +  GP-based	load/store (load or store within a global offset table)

    +  All load	or store instructions

  Again, the "size" of the displacement	is reported in bits; for example, 6
  bits means a 32 to 63	byte displacement. For both positive displacements
  (in the "0-extend" column) and negative displacements	(in the	"1-extend"
  column), the execution count is printed along	with percentage	and cumula-
  tive percentage. The summed cumulative percentage is printed last (in	the
  "Total" column).

  In the "static" analysis of instructions, each instruction is	counted	once
  per executed basic-block. The	"static" distribution will be the same as the
  regular opcode distribution when -nocounts is	specified. Following "static"
  totals for instructions and basic blocks, the	number and percentage of each
  instruction code is listed.

  The next two tables contain information on how many times each integer and
  floating-point register was accessed,	plus its percentage, ordered by
  register number.  For	integer	registers, the number and percent of uses as
  a base register in memory operations is also listed.

  Finally, pixstats prints a flat profile of CPU cycles	used by	procedures.
  This includes	the CPU	cycles used by the procedure, the percentage of	the
  total, the cumulative	percentage, the	number of instructions executed	as
  part of the procedure, its average number of CPU cycles per instruction,
  the number of	calls made to the procedure, the average number	of CPU cycles
  per call, and	the procedure name. If -numbers	is specified, the object and
  source file names and	line number are	also printed.

  Performance Counter Samples


  After	running	the uprofile or	kprofile utility to collect profiling data or
  your program or the kernel, respectively, run	prof to	examine	the resulting
  mon.out or kmon.out file, as follows:

    +  For uprofile output: prof prog_name mon.out

    +  For kprofile output: prof /vmunix kmon.out

  Use prof as for PC sampling, except that only	the executable has a profile.
  Old performance counter sample data files, generated on versions of the
  operating system prior to DIGITAL UNIX Version 4.0, must be analyzed as if
  they contained PC-sampling data.

RESTRICTIONS

  The -pixstats	option models execution	assuming a perfect memory system.
  Memory system	events such as cache misses will increase execution above the
  -pixstats predictions.

  The set of statistics	reported by the	-pixstats option and the format	of
  the report are the same as for previous versions of the pixstats(1) com-
  mand,	but note the following:

    +  The labels on disassembled basic	blocks take the	form procedure-name
       (or proc_at_0x... if no symbol is available) for	an initial block and
       procedure-name+offset for subsequent blocks.

    +  All reported cycles reflect CPU pipeline	interlocks, so they usually
       do not match the	reported instruction counts.

    +  If not all the shared objects used by a program are profiled, the
       procedure-call counts may be smaller than the jsr/bsr instruction
       counts.

FILES

  crt0.o
      Normal startup code

  mcrt0.o
      Startup code for PC-sampling

  libprof1.a
      Library for PC-sampling

  kmon.out
      Default kprofile data file

  mon.out
      Default PC-sampling data file

  umon.out
      Default uprofile data file

SEE ALSO

  Introduction:	prof_intro(1)

  Commands:  as(1), cc(1), gprof(1), pixie(1), uprofile(1), kprofile(1),
  dxprof(1).  (dxprof is available as an option.)

  Functions:  monitor(3), profil(2)

  Programmer's Guide