unixdev.net


Switch to SpeakEasy.net DSL

The Modular Manual Browser

Home Page
Manual: (OSF1-V5.1-alpha)
Page:
Section:
Apropos / Subsearch:
optional field



awk(1)								       awk(1)



NAME

  awk -	Pattern	scanning and processing	language

SYNOPSIS

  awk [-F ERE] [-f program_file]... [-v	var=val]... [argument]...

  awk [-F ERE] [-v var=val]... ['program_text']	[argument]...

STANDARDS

  Interfaces documented	on this	reference page conform to industry standards
  as follows:

  awk:	XCU5.0

  Refer	to the standards(5) reference page for more information	about indus-
  try standards	and associated tags.

OPTIONS

  -F ERE
      Defines ERE (extended regular expression)	as the value of	the input
      field separator before any input is read.	 Using this option is compar-
      able to assigning	a value	to the built-in	variable FS.

  -f program_file
      Specifies	the pathname (program_file) of a file containing a awk pro-
      gram.  If	multiple instances of this option are specified, the concate-
      nation of	the files specified as program_file in the order specified is
      the awk program. The awk program can alternatively be specified on the
      command line as the single argument program_text.

  -v var=val
      The var=val argument is an assignment operand that specifies a value
      (val) for	a variable (var). The specified	variable assignment occurs
      prior to executing the awk program, including the	actions	associated
      with BEGIN patterns (if any are in the program).	Multiple occurrences
      of the -v	option can be specified	on the awk command line.

OPERANDS

  'program_text'
      If -f program_file is not	specified, the first parameter to awk is
      program_text, delimited by single	quotation (') characters.

      See the DESCRIPTION section for the processing of	this parameter.

  argument
      The following two	types of argument can be intermixed:

      input_file
	  A pathname of	a file that contains the input to be read, which is
	  matched against the set of patterns in the program.  If no
	  input_file operands are specified, or	if the input_file argument is
	  -, standard input is used.

      var=val
	  The characters before	the = represent	the name of an awk variable.
	  If that name is an awk reserved word,	the behavior is	undefined.
	  The characters following the = are interpreted as if they appeared
	  in the awk program preceded and followed by a	double quotation (")
	  character, in	other words, as	a string value.	 If the	value is con-
	  sidered a numeric string, the	variable is assigned a numeric value.
	  Each such variable assignment	occurs just prior to the processing
	  of the following program_file, if any.  Thus,	an assignment before
	  the first program_file argument is executed after the	BEGIN actions
	  (if any), while an assignment	after the last program_file argument
	  occurs before	the END	actions	(if any).  If there are	no
	  program_file arguments, assignments are executed before processing
	  the standard input.

DESCRIPTION

  The awk command executes programs written in the awk programming language,
  a powerful pattern matching utility for textual data manipulation.  An awk
  program is a sequence	of patterns and	corresponding actions that are car-
  ried out when	a pattern is read.  The	awk command is a more powerful tool
  for text manipulation	than either sed	or grep.

  The awk command:

    +  Performs	convenient numeric processing

    +  Allows variables	within actions

    +  Allows general selection	of patterns

    +  Allows control flow in the actions

    +  Does not	require	any compiling of programs

  The pattern-matching and action statements of	the awk	language can be
  specified either on the command line or in a program file.  In either	case,
  the awk command first	reads all program statements.

  If -f	program_file is	not specified, the first operand to awk	is
  program_text,	delimited by single quotation (') characters.

  Execution of an awk program starts by	executing the actions associated with
  all BEGIN patterns in	the order they occur in	the program.  Then, each
  operand in an	input-file argument (or	standard input if an input file	is
  not specified) is processed in turn by:

    +  Reading input data until	a record separator is seen (a newline charac-
       ter by default)

    +  Splitting the current record into fields	using the current value	of FS

    +  Evaluating each pattern in the program in the order of occurrence

    +  Executing the action associated with each pattern that matches the
       current record

       The action for a	matching pattern is executed before evaluating subse-
       quent patterns.	The actions associated with all	END patterns are exe-
       cuted in	program	order.

  Refer	to the EXAMPLES	section	for an example that demonstrates the results
  of specifying	a variable assignment as a flag	argument or command argument
  in different positions on the	awk command line.

  The awk command reads	input data in the order	stated on the command line.
  If you specify input_file as a - (dash) or do	not specify a filename,	awk
  reads	standard input.

  The awk command reads	input data from	any of the following sources:

    +  Any input_file operands or their	equivalents, which can be affected by
       modifying the awk variables ARGV	and ARGC

    +  Standard	input, in the absence of any input_file	operands

    +  Arguments to the	getline	function

  Input	files must be text files.  When	the built-in variable RS is set	to a
  value	other than a newline character,	awk supports records terminated	with
  the specified	separator up to	LINE_MAX bytes.

  Pattern-action statements on the command line	are enclosed in	' (single
  quote	characters) to protect them from interpretation	by the shell.  Con-
  secutive pattern-action statements on	the same command line are separated
  by a ; (semicolon), within one set of	quote delimiters.

  By default, the awk command treats input lines as records, separated by
  spaces, tabs,	or a field separator you set with the FS variable.  (When a
  space	character is the field separator, multiple spaces are recognized as a
  single separator.) Fields are	referenced as $1, $2, and so on.  The refer-
  ence $0 specifies the	entire record (by default, a line).

  Program Structure


  A awk	program	is composed of pairs of	the form:

  pattern { action}

  Either the pattern or	the action (including the enclosing brace characters)
  can be omitted.

  If pattern lacks a corresponding action, awk writes the entire record	that
  contains the pattern to standard output.  If action lacks a corresponding
  pattern, awk applies the action to every record.

  Actions


  An action is a sequence of statements	that follow C language syntax.	Any
  single statement can be replaced by a	statement list enclosed	in braces.
  When statement is a list of statements, they must be separated by newline
  characters or	semicolons, and	are executed sequentially in order of appear-
  ance.	 Statements in the awk language	include:

       break
       continue
       delete array [expression]
       exit [expression]
       for (expression;expression;expression) statement
       for (variable in	array) statement
       if (expression) statement [else statement]
       next
       print [expression_list][>>file|>>>>file][| command]
       printf format[ ,expression_list][>>file|>>>>file][|	command]
       printf format[,expression_list ][>>file]
       while (expression) statement
       variable=expression

  Statements can end with a semicolon, a newline character, or the right
  brace	enclosing the action:

       { [ statement ... ] }

  Expressions can have string or numeric values	and are	built using the
  operators +, -, *, /,	%, a space for string concatenation, and the C opera-
  tors ++, --, +=, -=, *=, /=, =, ^=, ?:, >&gt;, >&gt;=, <&lt;, <&lt;=,	==, $, (), ~, !~, in,
  ||, &&amp;&&amp;, !, and !=.

  Because the actions process fields, input white space	is not preserved in
  the output.

  The file and command arguments in awk	statements can be literal names	or
  expressions enclosed in double quotation (") characters.  Identical string
  values in different statements refer to the same open	file.

  The print statement writes its arguments to standard output (or to a file
  if >&gt; file or >&gt;>&gt; file is present), separated by the current output field
  separator and	terminated by the current output record	separator.

  The printf statement formats its expression list according to	the format of
  the printf subroutine, and writes it arguments to standard output,
  separated by the output field	separator and terminated by the	output record
  separator.  You can redirect the output into a file using the	print ...
  file or printf( ...) >&gt; file statements.

  Variables


  Variables can	be scalars, array elements (denoted x[i]), or fields.  With
  the exception	of function parameters,	variables are not explicitly
  declared.

  Variable names can consist of	uppercase and lowercase	alphabetic letters,
  the underscore character, the	digits (0 to 9), and extended characters.
  Variable names cannot	begin with a digit. Field variables are	designated by
  $ (dollar sign), followed by a number	or numerical expression.  The effect
  of the field number expression evaluating to anything	other than a non-
  negative integer is unspecified.

  Variables are	initialized to the null	string.	 Array subscripts can be any
  string; they do not have to be numeric.  This	allows for a form of associa-
  tive memory.	Enclose	string constants in expressions	in double quotation
  (") characters.

  There	are several variables with special meaning to awk.  They include:

  ARGC
      The number of elements in	the ARGV array.

  ARGV
      An array of command line arguments, excluding options and	the
      program_file arguments, numbered from zero to ARGC-1.

      The arguments in ARGV can	be modified or added to; ARGC can be altered.
      As each input file ends, awk treats the next non-null element of ARGV,
      up to and	including the current value of ARGC-1, as the name of the
      next input file. Therefore, setting an element of	ARGV to	null means
      that it is not be	treated	as an input file.  When	the element is the
      character	-, standard input is specified.	 When the element matches the
      format for an assignment (variable=value), the element is	treated	as an
      assignment rather	than as	the name of an awk input file.

  CONVFMT
      The PRINTF format	for converting numbers to strings (except for output
      statements, where	OFMT is	used); %.6g by default.

  ENVIRON
      The variable ENVIRON is an array representing the	value of the environ-
      ment.  The indexes of the	array are strings consisting of	the names of
      the environmental	variables, and the value of each array element is a
      string consisting	of the value of	that variable.

  FILENAME
      The name of the current input file.  Inside a BEGIN action, the
      FILENAME value is	undefined.  Inside an END action, the value is the
      name of the last input file processed.

  FNR The ordinal number of the	current	input line (record) in the current
      file.  Inside a BEGIN action, the	value is zero. Inside an END action,
      the value	is the number of the last record processed in the last file
      processed.

  FS  Input field separator (default is	a space). If it	is a space, then any
      number of	spaces and tabs	can separate fields.

  NF  The number of fields in the current input	line (record) with a limit of
      199.

  NR  The number of the	current	input line (record).

  OFS The print	statement output field separator (default is a space).

  ORS The print	statement output record	separator (default is a	newline	char-
      acter).

  OFMT
      The printf statement output format for converting	numbers	to strings in
      output statements	(default is %.6g).

  RLENGTH
      The length of the	string matched by the match function.

  RS  Input record separator (default is a newline character).

  RSTART
      The starting position of the string matched by the match function,
      numbering	from 1.	 This is always	equivalent to the return value of the
      match function.

  SUBSEP
      The subscript separator string for multi-dimensional arrays.

  Functions


  There	are a variety of built-in functions that can be	used in	awk actions.

  Arithmetic Functions


  The arithmetic functions, except for int, are	based on the ISO C standard.
  The behavior is undefined in cases where the ISO C standard specifies	that
  an error be returned or that the behavior is undefined.

  atan2	(y,x)
      Returns the arctangent of	y/x.

  cos (x)
      Returns the cosine of x, where x is in radians.

  sin (x)
      Returns the sine of x where x is in radians.

  exp (x)
      Returns the exponential factor of	x.

  log (x)
      Returns the natural logarithm of x.

  sqrt (x)
      Returns the square root of x.

  int (x)
      Truncates	its argument to	an integer.  It	is truncated toward 0 when x
      >&gt;	0.

  rand()
      Returns a	random number n, such that 0 <&lt;=	n <&lt; 1.

  srand([expr])
      Sets the seed value for rand to expr or uses the time of day if expr is
      omitted.	The previous seed value	is returned.

  String Functions


  gsub(ere, repl[, in])
      Behave like sub (see below), except replace all occurrences of the reg-
      ular expression (like the	ed utility global substitute) in $0 or in the
      in argument, when	specified.

  index(s, t)
      Returns the position, in characters, numbering from 1, in	string s
      where string t first occurs, or zero if it does not occur	at all.

  length[([)]
      Returns the length, in characters, of its	argument taken as a string,
      or of the	whole record, $0, if there is no argument.

  match(s, ere)
      Returns the position, in characters, numbering from 1, in	string s
      where the	extended regular expression ere	occurs,	or zero	if it does
      not occur	at all.	 RSTART	is set to the starting position, zero if no
      match is found; RLENGTH is set to	the length of the matched string, -1
      if no match is found.

  split(s, a[fs])
      Splits the string	s into array elements a[1], a[2], ...  a[n], and
      return n.	The separation is done with the	extended regular expression
      fs or with the field separator FS	if fs is not given.  Each array	ele-
      ment has a string	value when created.  If	the string assigned to any
      array element, with any occurrence of the	decimal	point character	from
      the current locale changed to a period character,	would be considered a
      numeric string, the array	element	also has the numeric value of the
      numeric string.  The effect of a null string as the value	of fs is
      unspecified.

  sprintf(fmt, expr, expr, ...)
      Formats the expressions according	to the printf format given by fmt and
      return the resulting string.

  sub(ere,  repl[, in])
      Substitutes the string repl in place of the first	instance of the
      extended regular expression ERE in string	in and return the number of
      substitutions.  An ampersand (&&amp;) appearing in the	string repl is
      replaced by the string from in that matches the regular expression.
      For each occurrence of backslash (\) encountered when scanning the
      string repl from beginning to end, the next character is taken
      literally	and loses its special meaning (for example, \&&amp; is interpreted
      as a literal ampersand character).  Except for &&amp; and \, it is unspeci-
      fied what	the special meaning of any such	character is.  If in is
      specified	and it is not an lvalue, the behavior is undefined.  If	in is
      omitted, awk substitutes in the current record ($0).

  substr(s, m[,n])
      Returns the at most n character substring	of s that begins at position
      m, numbering from	1.  If n is missing, the length	of the substring is
      limited by the length of the string s.

  tolower(s)
      Returns a	string based on	the string s.  Each character in s that	is an
      upper case letter	specified to have a tolower mapping by the LC_TYPE
      category of the current locale is	replaced in the	returned string	by
      the lower	case letter specified by the mapping.  Other characters	in s
      are unchanged in the returned string.

  toupper(s)
      Returns a	string based on	the string s.  Each character in s that	is a
      lower case letter	specified to have a toupper mapping by the LC_TYPE
      category of the current locale is	replaced in the	returned string	by
      the upper	case letter specified by the mapping.  Other characters	in s
      are unchanged in the returned string.

  Input/Output and General Functions


  close(expression)
      Closes the file or pipe opened by	a print	or printf statement or a call
      to getline with the same string-valued expression.  If the close was
      successful, the function returns zero; otherwise,	it returns non-zero.

  expression | getline [var]
      Reads a record of	input from a stream piped from the output of a com-
      mand.  The stream	is created if no stream	is currently open with the
      value of expression as its common	name.  The stream created is
      equivalent to one	created	by a call to the popen function	with the
      value of expression as the command argument and a	value of r as the
      mode argument.  As long as the stream remains open, subsequent calls in
      which expression evaluates to the	same string read subsequent records
      from the file. The stream	will remain open until the close function is
      called with an expression	that evaluates to the same string value.  At
      that time, the stream is closed as if by a call to the pclose function.
      If var is	missing, $0 and	NF are set; otherwise, var is set.

  getline
      Sets $0 to the next input	record from the	current	input file.  This
      form of getline sets the NF, NR, and FNR variables.

  getline var
      Sets variable var	to the next input record from the current input	file.
      This form	of getline sets	the FNR	and NR variables.

  getline [var]	<&lt; expression
      Reads the	next record of input from a named file.	 The expression	is
      evaluated	to produce a string that is used as a full pathname.  If the
      file of that name	is not currently open, it is opened.  As long as the
      stream remains open, subsequent calls in which expression	evaluates to
      the same string value, read subsequent records from the file.  The file
      remains open until the close function is called with an expression that
      evaluates	to the same string value.  If var is missing, $0 and NF	are
      set; otherwise, var is set.

  system(expression)
      Executes the command given by expression in a manner equivalent to the
      system function and returns the exit status to the command.

  All forms of getline return 1	for successful input, zero for end of file,
  and -1 for an	error.

  The getline function sets $0 to the next input record	from the current
  input	file; getline <	file sets $0 to	the next record	from file.  The	func-
  tion getlinex	sets variable x	instead.  Finally, command| getline pipes the
  output of command into getline. Each call of getline returns the next	line
  of output from command.  In all cases, getline returns 1 for a successful
  input, 0 (zero) for End-of-File, and -1 for an error.

  The getline function sets $0 to the next input record	from the current
  input	file.  The getline function returns 1 for a successful input and 0
  for End-of-File.

  Where	strings	are used as the	name of	a file or pipeline, the	strings	must
  be textually identical.  The terminology "same string	value" implies that
  "equivalent strings",	even those that	differ only by space characters,
  represent different files.

  User-defined Functions


  The awk language also	provides user-defined functions.  Such functions can
  be defined as:

  function name(args,...) { statements }

  A function can be referred to	anywhere in an awk program; in particular,
  the function's use can precede the function definition.  The scope of	a
  function is global.

  Function arguments can be either scalars or arrays; the behavior is unde-
  fined	if an array name is passed as an argument that the function uses as a
  scalar, or if	a scalar expression is passed as an argument that the func-
  tion uses as an array.  Function arguments are passed	by value if scalar
  and by reference if array name.  Argument names are local to the function;
  all other variable names are global. The same	name is	not used as both an
  argument name	and as the name	of a function or special awk variable.	The
  same name must not be	used both as a variable	name with global scope and as
  the name of a	function.  The same name must not be used within the same
  scope	both as	a scalar variable and as an array.

  The number of	parameters in the function definition need not match the
  number of parameters in the function call.  Excess formal parameters can be
  used as local	variables.  If fewer arguments are supplied in a function
  call than are	in the function	definition, the	extra parameters that are
  used in the function body as scalars is initialized with a string value of
  the null string and a	numeric	value of zero, and the extra parameters	that
  are used in the function body	as arrays are initialized as empty arrays.
  If more arguments are	supplied in a function call than are in	the function
  definition, the behavior is undefined.

  When invoking	a function, no white space can be placed between the function
  name and the opening parenthesis.  Function calls can	be nested and recur-
  sive calls can be made upon functions.  Upon return from any nested or
  recursive function call, the values of all the calling function's parame-
  ters are unchanged, except for array parameters passed by reference.	The
  return statement can be used to return a value.


  Patterns


  Patterns are arbitrary Boolean combinations of patterns and relational
  expressions (the !, ||, and &&amp;&&amp; operators and parentheses for grouping).
  You must start and end regular expressions with slashes.  You	can use	regu-
  lar expressions as described for grep, including the following special
  characters:

  +   One or more occurrences of the pattern.

  ?   Zero or one occurrence of	the pattern.

  |   Either of	two statements.

  ( ) Grouping of expressions.

  Isolated regular expressions in a pattern apply to the entire	line.  Regu-
  lar expressions can occur in relational expressions. Any string (constant
  or variable) can be used as a	regular	expression, except in the position of
  an isolated regular expression in a pattern.

  If two patterns are separated	by a comma, the	action is performed on all
  lines	between	an occurrence of the first pattern and the next	occurrence of
  the second.

  There	are two	types of relational expressions	that you can use.  The first
  type has the form:

  expression  match_operator  pattern

  where	match_operator is either: ~ (for contains) or !~ (for does not con-
  tain).

  The second type has the form:

  expression  relational_operator  expression

  where	relational_operator is any of the six C	relational operators: <&lt;, >&gt;,
  <&lt;=, >&gt;=, ==, and !=.  An expression can be an arithmetic expression, a	rela-
  tional expression, or	a Boolean combination of these.

  Special Patterns


  You can use the BEGIN	and END	special	patterns to capture control before
  the first and	after the last input line is read, respectively.  BEGIN	must
  be the first pattern;	END must be the	last.

  Each BEGIN pattern is	matched	once and its associated	action executed
  before the first record of input is read and before command line assignment
  is done.  Each END pattern is	matched	once and its associated	action exe-
  cuted	after the last record of input has been	read.  These two patterns
  have associated actions.

  BEGIN	and END	do not combine with other patterns.  Multiple BEGIN and	END
  patterns are allowed.	The actions associated with the	BEGIN patterns is
  executed in the order	specified in the program, as are the END actions.  An
  END pattern can precede a BEGIN pattern in a program.

  You have two ways to designate an extended regular expression	other than
  white	space to separate fields.  You can use the -Fere option	on the com-
  mand line, or	you can	assign a string	with the expression to the built-in
  variable FS.	Either action changes the field	separator to ere.

  There	are no explicit	conversions between numbers and	strings.  To force an
  expression to	be treated as a	number,	add 0 to it.  To force it to be
  treated as a string, append a	null string ("").




  Comment Delimiter


  In the awk language, a comment starts	with the sharp sign character, #, and
  continues to the end of the line.  The # does	not have to be the first
  character on the line.  The awk language ignores the rest of the line	fol-
  lowing a sharp sign. For example :

       # This program prints a nice friendly message. It helps

       # Keep novice users from	being afraid of	the computer.

  The purpose of a comment is to help you or another person understand the
  program at a later time.

EXIT STATUS

  The following	exit values are	returned:

  0   Successful completion.

  >&gt;0  An error occurred.

EXAMPLES

   1.  To display the file lines that are longer than 72 bytes,	enter:
	    % awk  'length  >&gt;72'  chapter1

       This command selects each line of the file chapter1 that	is longer
       than 72 bytes.  The command then	writes these lines to standard output
       because no action is specified.

   2.  To display all lines between the	words start and	stop, enter:
	    % awk  '/start/,/stop/'  chapter1

   3.  To run an awk program (sum2.awk)	that processes a file (chapter1),
       enter:
	    % awk  -f  sum2.awk	 chapter1

   4.  The following awk program computes the sum and average of the numbers
       in the second column of the input file:


		    {
			    sum	+= $2
		    }
	    END	    {
		    print "Sum:	", sum;
		    print "Average:", sum/NR;
		    }

       The first action	adds the value of the second field of each line	to
       the sum variable.  The awk command initializes sum, and all variables,
       to 0 (zero) before starting.  The keyword END before the	second action
       causes awk to perform that action after all of the input	file is	read.
       The NR variable,	which is used to calculate the average,	is a special
       variable	containing the number of records (lines) that were read.

   5.  To print	the names of the users who have	the C shell as the initial
       shell, enter:
	    % awk  -F: '$7 ~ /csh/ {print $1}' /etc/passwd

   6.  To print	the first two fields in	reversed order,	enter:
	    % awk '{ print $2, $1 }'

   7.  The following awk program prints	the first two fields of	the input
       file in reversed	order, with input fields separated by a	comma, then
       adds up the first column	and prints the sum and average:


	    BEGIN   { FS = "," }
		    { print $2,	$1}
		    { s	+= $1 }
	    END	    { print "sum is", s, "average is", s/NR }

   8.  The following example shows how command line assignments	synchronize
       with awk	program	statements.

       Consider	the following set of awk statements that make up a program
       named test_program:


	    BEGIN { if (RS == ":")
		    print "Assignment in effect	for BEGIN statements"
		  }
		  { if (RS == ":")
		    print "Assignment in effect	for middle statements"
		  }
	    END	  { if (RS == ":")
		    print "Assignment in effect	for END	statements"
		  }

       Notice the different results that are produced by different ways	of
       assigning a value to RS on the awk command line.	 The file text_file
       contains	the line "Hello, Hello".
	    % awk -f test_program -v RS=: text_file

	    Assignment in effect for BEGIN statements
	    Assignment in effect for middle statements
	    Assignment in effect for END statements

	    % awk -f test_program RS=: text_file

	    Assignment in effect for middle statements
	    Assignment in effect for END statements

	    % awk -f test_program text_file RS=:

	    Assignment in effect for END statements



ENVIRONMENT VARIABLES

  The following	environment variables affect the execution of awk:

  LANG
      Provides a default value for the internationalization variables that
      are unset	or null. If LANG is unset or null, the corresponding value
      from the default locale is used. If any of the internationalization
      variables	contain	an invalid setting, the	utility	behaves	as if none of
      the variables had	been defined.

  LC_ALL
      If set to	a non-empty string value, overrides the	values of all the
      other internationalization variables.

  LC_CTYPE
      Determines the locale for	the interpretation of sequences	of bytes of
      text data	as characters (for example, single-byte	as opposed to multi-
      byte characters in arguments).

  LC_MESSAGES
      Determines the locale for	the format and contents	of diagnostic mes-
      sages written to standard	error.

  NLSPATH
      Determines the location of message catalogs for the processing of
      LC_MESSAGES.

SEE ALSO

  Commands:  grep(1), lex(1), sed(1)

  Routines:  printf(3)

  Programming Support Tools