unixdev.net


Switch to SpeakEasy.net DSL

The Modular Manual Browser

Home Page
Manual: (OSF1-V5.1-alpha)
Page:
Section:
Apropos / Subsearch:
optional field



sort(1)								      sort(1)



NAME

  sort - Sorts or merges files

SYNOPSIS

  sort [-m] [-o	output_file] [-Abdfinru] [-k keydef]...	[-t character] [-T
  directory] [-y] [kilobytes] [-z record_size]... file...

  sort -c  [-u]	[-Abdfinru] [-k	keydef]... [-t character] [-T directory] [-y]
  [kilobytes] [-z record_size]... file...

  The following	older syntax is	now maintained for backward compatibility,
  but may be withdrawn in future issues:

  sort [-Abcdfimnru] [-o output_file] [-t character] [-T directory] [-y]
  [kilobytes] [-z record_size] [+fskip]	[.cskip] [-fskip] [.cskip]
  [-bdfinr]... file...

STANDARDS

  Interfaces documented	on this	reference page conform to industry standards
  as follows:

  sort:	 XCU5.0

  Refer	to the standards(5) reference page for more information	about indus-
  try standards	and associated tags.

OPTIONS

  The -d, -f, -i, -n, and -r options override the default ordering rules.
  When ordering	options	appear independent of any key field specifications,
  the requested	field ordering rules are applied globally to all sort keys.
  When attached	to a specific key (see -k), the	specified ordering options
  override all global ordering options for that	key.  In the obsolescent
  forms, if one	or more	of these options follows a +fskip option, it affects
  only the key field specified by that preceding option.

  -A  [Tru64 UNIX]  Sorts on a byte-by-byte basis using	each character's
      encoded value.  On some systems, extended	characters will	be considered
      negative values, and so sort before ASCII	characters.  If	you are	sort-
      ing ASCII	characters in a	non-C/POSIX locale, this option	performs much
      faster.

  -b  Ignores leading spaces and tabs when determining the starting and	end-
      ing positions of a restricted sort key.  If the -b option	is specified
      before the first -k option, the -b option	is applied to all -k options
      on the command line; otherwise, the -b option can	be independently
      attached to each -k field_start or field_end argument.

  -c  Checks that the input is sorted according	to the ordering	rules speci-
      fied in the options and the collating sequence of	the current locale.
      No output	is produced; only the exit code	is affected.

  -d  Specifies	that only spaces and alphanumeric characters (according	to
      the current setting of LC_TYPE) are significant in comparisons.

  -f  Treats all lowercase characters as their uppercase equivalents
      (according to the	current	setting	of LC_TYPE) for	the purposes of	com-
      parison.

  -i  Sorts only by printable characters (according to the current setting of
      LC_TYPE).

  -k keydef
      Specifies	one or more (up	to 50) restricted sort key field definitions.
      This option replaces the obsolescent +fskip.cskip	and -fskip.cskip
      options. A field comprises a maximal sequence of non-separating charac-
      ters and,	in the absence of the -t option, any preceding field separa-
      tor.

      The format of a key field	definition is as follows:

      field_start[type][,field_end[type]]

      The field_start and field_end arguments define a key field that is res-
      tricted to a portion of the line,	and type is a modifier specified by
      b, d, f, i, n, r,	or t.  The b modifier behaves like the -b option, but
      applies only to the field_start or field_end argument to which it	is
      attached.	 The t modifier	indicates that the key field is	processed as
      CPU time.	The other modifiers behave like	their corresponding options,
      but apply	only to	the key	field to which they are	attached; these
      modifiers	have this effect if specified with field_start,	field_end or
      both.

      Modifiers	attached to a field_start or field_end argument	override any
      specifications made by the options.  A missing field_end argument	means
      the last character of the	line.  When multiple sort keys are specified,
      it is advisable to specify a field_end argument to avoid possible	con-
      fusion.

      The field_start portion of the keydef argument takes the following
      form:

      field_number[.first_character]

      Fields and characters within fields are numbered starting	with 1.	The
      field_number and first_character pieces, interpreted as positive
      decimal integers,	specify	the character to be used as part of a sort
      key.  If first_character is not specified, the default is	the first
      character	of the field.

      The field_end portion of the keydef argument takes the following form:

      field_number[.last_character]

      The field_number syntax is the same as that described for	field_start.
      The last_character argument, interpreted as a nonnegative	decimal
      integer, specifies the last character to be used as part of the sort
      key.  If last_character evaluates	to 0 (zero) or is not specified, the
      default is the last character of the field specified by field_number.

      If -b is in effect, characters within a field are	counted	from the
      first nonspace character in the field.  (This applies separately to
      first_character and last_character.)

      If -k is not specified, the default sort key is the entire line.

      When there are multiple key fields, later	keys are compared only after
      all earlier keys compare as equal.  Except when the -u option is speci-
      fied, lines that otherwise compare as equal are ordered as though	none
      of the options -d, -f, -i, -n, or	-k were	present	(but with -r still in
      effect, if it was	specified) and with all	bytes in the lines signifi-
      cant to the comparison.

      The algorithm for	the -k option can be summarized	as follows:


	   /*
	    * -ka.b,c.d	= if d==0 then +(a-1).(b-1) -c.d
	    *		   else	+(a-1).(b-1) -(c-1).d
	    */

  -m  Merges only (assumes sorted input).

  -n  Sorts any	initial	numeric	strings	(including regular expressions con-
      sisting of optional spaces, optional dashes, and zero (0)	or more
      digits with optional radix character and thousands separator, as
      defined by the current locale) by	arithmetic value.  An empty digit
      string is	treated	as zero; leading zeros and signs on zeros do not
      affect ordering.	Only one period	(.) can	be used	in numeric strings.
      All subsequent periods (.) and any character to the right	of the period
      (.) will be ignored.

  -o output_file
      Directs output to	output_file instead of standard	output.	 The
      output_file can be the same as one of the	input files.

  -r  Reverses the order of the	specified sort.

  -t character
      Sets the field separator character to character. The character argument
      is not considered	to be part of a	field (although	it can be included in
      a	sort key).  Each occurrence of character is significant	(for example,
      two consecutive occurrences of character delimit an empty	field).	 To
      specify the tab character	as the field separator,	you must enclose it
      in ' ' (single quotes).

      The default field	separator is one or more spaces.

  -T directory
      [Tru64 UNIX]  Places all the temporary files that	are created in direc-
      tory.

  -u  Suppresses all but one in	each set of equal lines	(for example, lines
      whose sort keys match exactly).  Ignored characters such as leading
      tabs and spaces, and characters outside of sort keys are not considered
      in this type of comparison.

      If used with the -c option, -u checks that there are no lines with
      duplicate	keys, in addition to checking that the input file is sorted.

  -y [kilobytes]
      [Tru64 UNIX]  Starts the sort command using kilobytes of main storage
      and adds storage as needed.  (If kilobytes is less than the minimum
      storage size or greater than the maximum,	the minimum or maximum is
      used instead.)  If the -y	option is omitted, the sort command starts
      with the default storage size; -y	0 starts with minimum storage, and -y
      (with no value) starts with the maximum storage.	The amount of storage
      used by the sort command has a significant impact	on performance.
      Sorting a	small file in a	large amount of	storage	is wasteful.

  -z record_size
      Prevents abnormal	termination if lines being sorted are longer than the
      default buffer size can handle.  When the	-c or -m options are speci-
      fied, the	sorting	phase is omitted and a system default size buffer is
      used.  If	sorted lines are longer	than this size,	sort terminates
      abnormally.  The -z option specifies that	the longest line be recorded
      in the sort phase	so that	adequate buffers can be	allocated in the
      merge phase.  The	record_size argument must be a value in	bytes equal
      to or greater than the number of bytes in	the longest line to be
      merged.

  +fskip.cskip
      Specifies	the start position of a	key field.  See	the -k option for a
      description of the current way to	perform	this operation.	 (Obsoles-
      cent)

      The fskip	variable specifies the number of fields	to skip	from the
      beginning	of the input line, and the cskip variable specifies the
      number of	additional characters to skip to the right beyond that point.
      For both the starting point (+fskip.cskip) and the ending	point
      (-fskip.cskip) of	a sort key, fskip is measured from the beginning of
      the input	line, and cskip	is measured from the last field	skipped.  If
      you omit .cskip, .0 (zero) is assumed.  If you omit fskip, 0 (zero) is
      assumed.	If you omit the	ending field specifier (-fskip.cskip), the
      end of the line is the end of the	sort key.

      You can supply more than one sort	key by repeating +fskip.cskip and
      -fskip.cskip.  In	cases where you	specify	more than one sort key,	keys
      specified	further	to the right on	the command line are compared only
      after all	earlier	keys are sorted.  For example, if the first key	is to
      be sorted	in numerical order and the second according to the collating
      sequence,	all strings that start with the	number 1 are sorted according
      to the collating order before the	strings	that start with	the number 2.
      Lines that are identical in all keys are sorted with all characters
      significant.  You	can also specify different options for different sort
      keys in multiple sort keys.

  -fskip.cskip
      Specifies	the end	position of a key field.  See the -k option for	a
      description of the current way to	perform	this operation.	 (Obsoles-
      cent)

DESCRIPTION

  The sort command sorts lines in its input files and writes the result	to
  standard output.

  The sort command performs one	of the following functions:

   1.  Sorts lines of all the named files together and writes the result to
       the specified output.

   2.  Merges lines of all the named (presorted) files together	and writes
       the result to the specified output.

   3.  Checks that a single input file is correctly presorted.

  Comparisons are based	on one or more sort keys extracted from	each line of
  input	(or the	entire line if no sort keys are	specified), and	are performed
  using	the collating sequence of the current locale.

  The sort command treats all of its input files as one	file when it performs
  the sort.  A - (dash)	in place of a file name	specifies standard input.  If
  you do not specify a file name, it sorts standard input.

  The sort command can handle a	variety	of collation rules typically used in
  Western European languages, including	primary/secondary sorting, one-to-two
  character mapping, N-to-one character	mapping, and ignore-character
  mapping.  To summarize briefly:






  Primary/Secondary Sorting


  In this system, a group of characters	all sort to the	same primary loca-
  tion.	 If there is a tie, a secondary	sort is	applied.  For example, in
  French, the plain and	accented a's all sort to the same primary location.
  If two strings collate to the	same primary location, the secondary sort
  goes into effect.  These words are in	correct	French order:

       abord
       pre
       aprs
       pret
       azur

  One-to-Two Character Mappings


  This system requires that certain single characters be treated as if they
  were two characters.	For example, in	German,	the  (scharfes-S) is collated
  as if	it were	ss.

  N-to-One Character Mappings


  Some languages treat a string	of characters as if it were one	single col-
  lating element.  For example,	in Spanish, the	ch and ll sequences are
  treated as their own elements	within the alphabet.  (ch comes	between	c and
  d in the alphabet, and ll comes between l and	m.)

  Ignore-Character Mappings


  In some cases, certain characters may	be ignored in collation.  For exam-
  ple, if - were defined as an ignore-character, the strings re-locate and
  relocate would sort to the same place. The results that you get from sort
  depend on the	collating sequence as defined by the current setting of	the
  LC_COLLATE environment variable.  The	configuration files for	collation and
  character classification information are /usr/lib/nls/loc/src/locale.src. A
  field	is one or more characters bounded by the beginning of a	line and the
  current field	separator, or one or more characters bounded by	a field
  separator on either side.  The space character is the	default	field separa-
  tor. Lines longer than 1024 bytes are	truncated by sort.  The	maximum
  number of fields on a	line is	50.

EXIT STATUS

  The sort command returns the following exit values:

  0   All input	files were output successfully,	or -c was specified and	the
      input file was correctly sorted.

  1   Under the	-c option, the file was	not ordered as specified, or if	the
      -c and -u	options	were both specified, two input lines were found	with
      equal keys.

  >>1  An error occurred.



EXAMPLES

  The following	examples apply to the C	locale,	unless it is specifically
  stated otherwise.

   1.  To perform a simple sort, enter:
	    sort fruits

       This displays the contents of fruits sorted in ascending	lexicographic
       order.  This means that the characters in each column are compared one
       by one, including spaces, digits, and special characters.

       For instance, if	fruits contains	the text:


	    banana
	    orange
	    Persimmon
	    apple
	    %%banana
	    apple
	    ORANGE

       Then sort fruits	displays:
	    %%banana
	    ORANGE
	    Persimmon
	    apple
	    apple
	    banana
	    orange

       This order follows from the fact	that in	the ASCII collating sequence,
       symbols (such as	%) precede uppercase letters, and all uppercase
       letters precede the lowercase letters. If you are using a different
       collating order,	your results may be different.

   2.  To group	lines that contain uppercase and special characters with
       similar lowercase lines,	and remove duplicate lines, enter:
	    sort -d -f -u fruits

       The -u option tells sort	to remove duplicate lines, making each line
       of the file unique.  This displays:
	    apple
	    %%banana
	    orange
	    Persimmon

       Not only	was the	duplicate apple	removed, but banana and	ORANGE were
       removed as well.	The -d option told sort	to ignore symbols, so
       %%banana	and banana were	considered to be duplicate lines and banana
       was removed.  The -f option told	sort not to differentiate between
       uppercase and lowercase,	so ORANGE and orange were considered to	be
       duplicate lines and ORANGE was removed.

       When the	-u option is used with input that contains nonidentical	lines
       that are	considered by sort (due	to other options) to be	duplicates,
       there is	no way to predict which	lines sort will	keep and which it
       will remove.

   3.  To sort as in Example 2,	but remove duplicates unless capitalized or
       punctuated differently, enter:
	    sort -u -k 1df -k 1	fruits

       Options appearing between sort key specifiers apply only	to the
       specifier preceding them.  There	are two	sorts specified	in this	com-
       mand line. The -k 1df argument specifies	the first sort,	of the same
       type done with -d -f in Example 3.  Then	-k 1 performs another com-
       parison to distinguish lines that are not actually identical.  This
       prevents	-u, which applies to both sorts	because	it precedes the	first
       sort key	specifier, from	removing lines that are	not exactly identical
       to other	lines.

       Given the fruits	file shown in Example 1, the added -k 1	distinguishes
       %%banana	from banana and	ORANGE from orange. However, the two
       instances of apple are exactly identical, so one	of them	is deleted.
	    apple
	    %%banana
	    banana
	    ORANGE
	    orange
	    Persimmon

   4.  To specify a new	field separator, enter:
	    sort -t : -k 2 vegetables

       This sorts vegetables, comparing	the text that follows the first	colon
       on each line.  The -t : option tells sort that colons separate fields.
       The -k 2	argument tells sort to ignore the first	field and to compare
       from the	start of the second field to the end of	the line.  If veget-
       ables contains:


	    yams:104
	    turnips:8
	    potatoes:15
	    carrots:104
	    green beans:32
	    radishes:5
	    lettuce:15

       then sort -t : -k 2 vegetables displays:
	    carrots:104
	    yams:104
	    lettuce:15
	    potatoes:15
	    green beans:32
	    radishes:5
	    turnips:8

       The numbers are not in ascending	order. This is because a lexico-
       graphic sort compares each character from left to right.	 In other
       words, 3	comes before 5 so 32 comes before 5.

   5.  To sort on more than one	field, enter:
	    sort -t : -k 2n -k 1r vegetables

       This performs a numeric sort on the second field	(-k 2n)	and then,
       within that ordering, sorts the first field in reverse collating	order
       (-k 1r).	 The output looks like this:
	    radishes:5
	    turnips:8
	    potatoes:15
	    lettuce:15
	    green beans:32
	    yams:104
	    carrots:104

       The lines are sorted in numeric order; when two lines have the same
       number, they appear in reverse collating	order.

   6.  To replace the original file with the sorted text, enter:
	    sort -o vegetables vegetables

       The -o vegetables option	stores the sorted output into the file veget-
       ables.

   7.  To collate using	Spanish	rules, set the LC_COLLATE (or LANG) environ-
       ment variable to	a Spanish locale, and then use sort in the regular
       way, enter:
	    sort sp.words

       If an input file	named sp.words contains	the following Spanish words:


	    dama
	    loro
	    chapa
	    canto
	    mover
	    chocolate
	    curioso
	    llanura

       The sorted file looks like this:
	    canto
	    curioso
	    chapa
	    chocolate
	    dama
	    loro
	    llanura
	    mover

       If you sort the file in the default C locale, the output	looks like
       this:
	    canto
	    chapa
	    chocolate
	    curioso
	    dama
	    llanura
	    loro
	    mover



ENVIRONMENT VARIABLES

  The following	environment variables affect the execution of sort:

  LANG
      Provides a default value for the internationalization variables that
      are unset	or null. If LANG is unset or null, the corresponding value
      from the default locale is used.	If any of the internationalization
      variables	contain	an invalid setting, the	utility	behaves	as if none of
      the variables had	been defined.

  LC_ALL
      If set to	a non-empty string value, overrides the	values of all the
      other internationalization variables.

  LC_CTYPE
      Determines the locale for	the interpretation of sequences	of bytes of
      text data	as characters (for example, single-byte	as opposed to multi-
      byte characters in arguments) and	the behavior of	character classifica-
      tion for the -b, -d, -f, -i, and -n options.

  LC_MESSAGES
      Determines the locale for	the format and contents	of diagnostic mes-
      sages written to standard	error.

  NLSPATH
      Determines the location of message catalogues for	the processing of
      LC_MESSAGES.

FILES

  /usr/lib/nls/loc/src/locale.src
      Configuration files

SEE ALSO

  Commands:  comm(1), join(1), uniq(1)

  Functions:  setlocale(3), tolower(3)

  Files:  locale(4)

  Standards:  standards(5)