unixdev.net


Switch to SpeakEasy.net DSL

The Modular Manual Browser

Home Page
Manual: (HP-UX-11.11)
Page:
Section:
Apropos / Subsearch:
optional field



 awk(1)								      awk(1)




 NAME
      awk - pattern-directed scanning and processing language

 SYNOPSIS
      awk [-Ffs] [-v var=value] [program | -f progfile ...] [file ...]

 DESCRIPTION
      awk scans each input file for lines that match any of a set of
      patterns specified literally in program or in one or more files
      specified as -f progfile.	 With each pattern there can be an
      associated action that is to be performed when a line in a file
      matches the pattern.  Each line is matched against the pattern portion
      of every pattern-action statement, and the associated action is
      performed for each matched pattern.  The file name - means the
      standard input.  Any file of the form var=value is treated as an
      assignment, not a filename.  An assignment is evaluated at the time it
      would have been opened if it were a filename, unless the -v option is
      used.

      An input line is made up of fields separated by white space, or by
      regular expression FS.  The fields are denoted $1, $2, ...; $0 refers
      to the entire line.

    Options
      awk recognizes the following options and arguments:

	   -F fs	  Specify regular expression used to separate
			  fields.  The default is to recognize space and tab
			  characters, and to discard leading spaces and
			  tabs.	 If the -F option is used, leading input
			  field separators are no longer discarded.

	   -f progfile	  Specify an awk program file.	Up to 100 program
			  files can be specified.  The pattern-action
			  statements in these files are executed in the same
			  order as the files were specified.

	   -v var=value	  Cause var=value assignment to occur before the
			  BEGIN action (if it exists) is executed.

    Statements
      A pattern-action statement has the form:

	   pattern { action }

      A missing { action } means print the line; a missing pattern always
      matches.	Pattern-action statements are separated by new-lines or
      semicolons.

      An action is a sequence of statements.  A statement can be one of the
      following:



 Hewlett-Packard Company	    - 1 -   HP-UX Release 11i: November 2000






 awk(1)								      awk(1)




	   if(expression) statement [ else statement ]
	   while(expression) statement
	   for(expression;expression;expression) statement
	   for(var in array) statement
	   do statement while(expression)
	   break
	   continue
	   {[statement	...]}
	   expression	       # commonly  var = expression
	   print [expression-list] [ >&gt&gt> expression]
	   printf format [, expression-list] [ >&gt&gt> expression]
	   return [expression]
	   next		       # skip remaining patterns on this input line.
	   delete array [expression]# delete an array element.
	   exit [expression]   # exit immediately; status is expression.

      Statements are terminated by semicolons, newlines or right braces.  An
      empty expression-list stands for $0.  String constants are quoted
      (""), with the usual C escapes recognized within.	 Expressions take on
      string or numeric values as appropriate, and are built using the
      operators +, -, *, /, %, ^ (exponentiation), and concatenation
      (indicated by a blank).  The operators ++, --, +=, -=, *=, /=, %=, ^=,
      **=, >&gt&gt&gt;, >&gt&gt&gt;=, <&lt&lt&lt;, <&lt&lt&lt;=, ==, !=, and ?: are also available in expressions.
      Variables can be scalars, array elements (denoted x[i]) or fields.
      Variables are initialized to the null string.  Array subscripts can be
      any string, not necessarily numeric (this allows for a form of
      associative memory).  Multiple subscripts such as [i,j,k] are
      permitted.  The constituents are concatenated, separated by the value
      of SUBSEP.

      The print statement prints its arguments on the standard output (or on
      a file if >&gt&gt&gt;file or >&gt&gt&gt;>&gt&gt&gt;file is present or on a pipe if |cmd is present),
      separated by the current output field separator, and terminated by the
      output record separator.	file and cmd can be literal names or
      parenthesized expressions.  Identical string values in different
      statements denote the same open file.  The printf statement formats
      its expression list according to the format (see printf(3)).

    Built-In Functions
      The built-in function close(expr) closes the file or pipe expr opened
      by a print or printf statement or a call to getline with the same
      string-valued expr.  This function returns zero if successful,
      otherwise, it returns non-zero.

      The customary functions exp, log, sqrt, sin, cos, atan2 are built in.
      Other built-in functions are:

	 blength[([s])]	   Length of its associated argument (in bytes)
			   taken as a string, or of $0 if no argument.





 Hewlett-Packard Company	    - 2 -   HP-UX Release 11i: November 2000






 awk(1)								      awk(1)




	 length[([s])]	   Length of its associated argument (in characters)
			   taken as a string, or of $0 if no argument.

	 rand()		   Returns a random number between zero and one.

	 srand([expr])	   Sets the seed value for rand, and returns the
			   previous seed value.	 If no argument is given,
			   the time of day is used as the seed value;
			   otherwise, expr is used.

	 int(x)		   Truncates to an integer value

	 substr(s, m [, n])
			   Return the at most n-character substring of s
			   that begins at position m, numbering from 1.	 If
			   n is omitted, the substring is limited by the
			   length of string s.

	 index(s, t)	   Return the position, in characters, numbering
			   from 1, in string s where string t first occurs,
			   or zero if it does not occur at all.

	 match(s, ere)	   Return the position, in characters, numbering
			   from 1, in string s where the extended regular
			   expression ere occurs, or 0 if it does not.	The
			   variables RSTART and RLENGTH are set to the
			   position and length of the matched string.

	 split(s, a[, fs]) Splits the string s into array elements a[1],
			   a[2], ..., a[n], and returns n.  The separation
			   is done with the regular expression fs, or with
			   the field separator FS if fs is not given.

	 sub(ere, repl [, in])
			   Substitutes repl for the first occurrence of the
			   extended regular expression ere in the string in.
			   If in is not given, $0 is used.

	 gsub		   Same as sub except that all occurrences of the
			   regular expression are replaced; sub and gsub
			   return the number of replacements.

	 sprintf(fmt, expr, ...)
			   String resulting from formatting expr ...
			   according to the printf(3S) format fmt

	 system(cmd)	   Executes cmd and returns its exit status

	 toupper(s)	   Converts the argument string s to uppercase and
			   returns the result.




 Hewlett-Packard Company	    - 3 -   HP-UX Release 11i: November 2000






 awk(1)								      awk(1)




	 tolower(s)	   Converts the argument string s to lowercase and
			   returns the result.

      The built-in function getline sets $0 to the next input record from
      the current input file; getline <&lt&lt&lt; file sets $0 to the next record from
      file.  getline x sets variable x instead.	 Finally, cmd | getline
      pipes the output of cmd into getline; each call of getline returns the
      next line of output from cmd.  In all cases, getline returns 1 for a
      successful input, 0 for end of file, and -1 for an error.

    Patterns
      Patterns are arbitrary Boolean combinations (with ! || &&amp&amp&amp;&&amp&amp&amp;) of regular
      expressions and relational expressions.  awk supports Extended Regular
      Expressions as described in regexp(5).  Isolated regular expressions
      in a pattern apply to the entire line.  Regular expressions can also
      occur in relational expressions, using the operators ~ and !~.  /re/
      is a constant regular expression; any string (constant or variable)
      can be used as a regular expression, except in the position of an
      isolated regular expression in a pattern.

      A pattern can consist of two patterns separated by a comma; in this
      case, the action is performed for all lines from an occurrence of the
      first pattern though an occurrence of the second.

      A relational expression is one of the following:

		expression matchop regular-expression
		expression relop expression
		expression in array-name
		(expr,expr,...) in array-name

      where a relop is any of the six relational operators in C, and a
      matchop is either ~ (matches) or !~ (does not match).  A conditional
      is an arithmetic expression, a relational expression, or a Boolean
      combination of the two.

      The special patterns BEGIN and END can be used to capture control
      before the first input line is read and after the last.  BEGIN and END
      do not combine with other patterns.

    Special Characters
      The following special escape sequences are recognized by awk in both
      regular expressions and strings:

	   Escape    Meaning
	   \a	     alert character
	   \b	     backspace character
	   \f	     form-feed character
	   \n	     new-line character
	   \r	     carriage-return character




 Hewlett-Packard Company	    - 4 -   HP-UX Release 11i: November 2000






 awk(1)								      awk(1)




	   \t	     tab character
	   \v	     vertical-tab character
	   \nnn	     1- to 3-digit octal value nnn
	   \xhhh     1- to n-digit hexadecimal number

    Variable Names
      Variable names with special meanings are:

	   FS		     Input field separator regular expression; a
			     space character by default; also settable by
			     option -Ffs.

	   NF		     The number of fields in the current record.

	   NR		     The ordinal number of the current record from
			     the start of input. Inside a BEGIN action the
			     value is zero. Inside an END action the value
			     is the number of the last record processed.

	   FNR		     The ordinal number of the current record in the
			     current file. Inside a BEGIN action the value
			     is zero. Inside an END action the value is the
			     number of the last record processed in the last
			     file processed.

	   FILENAME	     A pathname of the current input file.

	   RS		     The input record separator; a newline character
			     by default.

	   OFS		     The print statement output field separator; a
			     space character by default.

	   ORS		     The print statement output record separator; a
			     newline character by default.

	   OFMT		     Output format for numbers (default %.6g).	If
			     the value of OFMT is not a floating-point
			     format specification, the results are
			     unspecified.

	   CONVFMT	     Internal conversion format for numbers (default
			     %.6g).  If the value of CONVFMT is not a
			     floating-point format specification, the
			     results are unspecified.

	   SUBSEP	     The subscript separator string for multi-
			     dimensional arrays; the default value is "\034"

	   ARGC		     The number of elements in the ARGV array.




 Hewlett-Packard Company	    - 5 -   HP-UX Release 11i: November 2000






 awk(1)								      awk(1)




	   ARGV		     An array of command line arguments, excluding
			     options and the program argument numbered from
			     zero to ARGC-1.

			     The arguments in ARGV can be modified or added
			     to; ARGC can be altered. As each input file
			     ends, awk will treat the next non-null element
			     of ARGV, up to the current value of ARGC-1,
			     inclusive, as the name of the next input file.
			     Thus, setting an element of ARGV to null means
			     that it will not be treated as an input file.
			     The name - indicates the standard input. If an
			     argument matches the format of an assignment
			     operand, this argument will be treated as an
			     assignment rather than a file argument.

	   ENVIRON	     Array of environment variables; subscripts are
			     names.  For example, if environment variable
			     V=thing, ENVIRON["V"] produces thing.

	   RSTART	     The starting position of the string matched by
			     the match function, numbering from 1. This is
			     always equivalent to the return value of the
			     match function.

	   RLENGTH	     The length of the string matched by the match
			     function.

      Functions can be defined (at the position of a pattern-action
      statement) as follows:

	   function foo(a, b, c) { ...; return x }

      Parameters are passed by value if scalar, and by reference if array
      name.  Functions can be called recursively.  Parameters are local to
      the function; all other variables are global.

      Note that if pattern-action statements are used in an HP-UX command
      line as an argument to the awk command, the pattern-action statement
      must be enclosed in single quotes to protect it from the shell.  For
      example, to print lines longer than 72 characters, the pattern-action
      statement as used in a script (-f progfile command form) is:

	   length >&gt&gt&gt; 72

      The same pattern action statement used as an argument to the awk
      command is quoted in this manner:

	   awk 'length >&gt&gt&gt; 72'





 Hewlett-Packard Company	    - 6 -   HP-UX Release 11i: November 2000






 awk(1)								      awk(1)




 EXTERNAL INFLUENCES
    Environment Variables
      LANG	     Provides a default value for the internationalization
		     variables that are unset or null.	If LANG is unset or
		     null, the default value of "C" (see lang(5)) is used.
		     If any of the internationalization variables contains
		     an invalid setting, awk will behave as if all
		     internationalization variables are set to "C".  See
		     environ(5).

      LC_ALL	     If set to a non-empty string value, overrides the
		     values of all the other internationalization variables.

      LC_CTYPE	     Determines the interpretation of text as single and/or
		     multi-byte characters, the classification of characters
		     as printable, and the characters matched by character
		     class expressions in regular expressions.

      LC_NUMERIC     Determines the radix character used when interpreting
		     numeric input, performing conversion between numeric
		     and string values and formatting numeric output.
		     Regardless of locale, the period character (the
		     decimal-point character of the POSIX locale) is the
		     decimal-point character recognized in processing awk
		     programs (including assignments in command-line
		     arguments).

      LC_COLLATE     Determines the locale for the behavior of ranges,
		     equivalence classes and multi-character collating
		     elements within regular expressions.

      LC_MESSAGES    Determines the locale that should be used to affect the
		     format and contents of diagnostic messages written to
		     standard error and informative messages written to
		     standard output.

      NLSPATH	     Determines the location of message catalogues for the
		     processing of LC_MESSAGES.

      PATH	     Determines the search path when looking for commands
		     executed by system(cmd), or input and output pipes.

      In addition, all environment variables will be visible via the awk
      variable ENVIRON.

    International Code Set Support
      Single- and multi-byte character code sets are supported except that
      variable names must contain only ASCII characters and regular
      expressions must contain only valid characters.





 Hewlett-Packard Company	    - 7 -   HP-UX Release 11i: November 2000






 awk(1)								      awk(1)




 DIAGNOSTICS
      awk supports up to 199 fields ($1, $2, ..., $199) per record.

 EXAMPLES
      Print lines longer than 72 characters:

	   length >&gt&gt&gt; 72

      Print first two fields in opposite order:

	   { print $2, $1 }

      Same, with input fields separated by comma and/or blanks and tabs:

	   BEGIN { FS = ",[ \t]*|[ \t]+" }
		 { print $2, $1 }

      Add up first column, print sum and average:

		   { s += $1 }"
	   END	   { print "sum is", s, " average is", s/NR }

      Print all lines between start/stop pairs:

	   /start/, /stop/

      Simulate echo command (see echo(1)):

	   BEGIN   {				 # Simulate echo(1)
		   for (i = 1; i <&lt&lt&lt; ARGC; i++) printf "%s ", ARGV[i]
		   printf "\n"
		   exit }

 AUTHOR
      awk was developed by AT&T, IBM, OSF, and HP.

 SEE ALSO
      lex(1), sed(1).

      A. V. Aho, B. W. Kernighan, P. J. Weinberger: The AWK Programming
      Language, Addison-Wesley, 1988.

 STANDARDS CONFORMANCE
      awk: SVID2, SVID3, XPG2, XPG3, XPG4, POSIX.2










 Hewlett-Packard Company	    - 8 -   HP-UX Release 11i: November 2000