unixdev.net


Switch to SpeakEasy.net DSL

The Modular Manual Browser

Home Page
Manual: (SunOS-4.1.3)
Page:
Section:
Apropos / Subsearch:
optional field

COLLDEF(8)                  System Manager's Manual                 COLLDEF(8)



NAME
       colldef - convert collation sequence source definition

SYNOPSIS
       /usr/etc/colldef filename

DESCRIPTION
       colldef  converts  a collation sequence source definition into a format
       usable by the strxfrm() and strcoll(3) functions.  It is used to define
       the  many ways in which strings can be ordered and collated.  strxfrm()
       transforms its first argument and places the result in its second argu-
       ment.   The transformed string is such that it can be correctly ordered
       with other transformed strings by using strcmp(),  strncmp(),  or  mem-
       cmp()  (see  string(3) and memory(3)).  strcoll(3) transforms its argu-
       ments and does a comparison.

       colldef reads the collation sequence source definition from  the  stan-
       dard input and stores the converted definition in filename.  The output
       file produced contains the database with collating sequence information
       in a form usable by system commands and routines.

       The collation sequence definition specifies a set of collating elements
       and the rules defining how strings containing these should be  ordered.
       This is most useful for different language definitions.

       The  colldef  command can support languages whose mapping and collating
       sequences can be described by the following cases:

       o   Ordering of single characters within the codeset.  For example,  in
           Swedish, V is sorted after U, before X and with W (V and W are con-
           sidered identical as far as sorting is concerned).

       o   Equivalence  class  definition.   A  collection  of  characters  is
           defined to have the same primary sorting value.

       o   Ordering  of  "double  characters"  in the collation sequence.  For
           example, in Spanish, ch and ll are collated after c and l,  respec-
           tively.

       o   Ordering of a single character as if it consists of two characters.
           For example, in German, the "sharp s", , is sorted as ss.  This  is
           a special instance of the next case below.

       o   Substitution  of  one character with a character string, that is, a
           one-to-many mapping.  In the  example  above,  the  character    is
           replaced with ss during sorting.

       o   Ignoring  certain  characters in the codeset during collation.  For
           example, if `-' is not specified in the specification  table,  then
           the strings re-locate and relocate are equal.

       o   Null  character mapping.  A character is mapped to a null collating
           element, and is ignored in sorting sequences.

       o   Secondary ordering between characters.  In the case where two char-
           acters are sorted together in the collation sequence, (for example,
           they have the same "primary" ordering), there is sometimes  a  sec-
           ondary  ordering  that  is used if two strings are identical except
           for characters that have the same primary ordering.   For  example,
           in French, the letters e and ` have the same primary ordering but e
           comes before ` in the secondary  ordering.   Thus  the  word  lever
           would  be  ordered  before  l`ver, but l`ver would be sorted before
           levitate.  Note: if e came before ` in the primary  ordering,  then
           l`ver would be sorted after levitate.

USAGE
       The  specification  file can consist of three statements: charmap, sub-
       stitute, and order.  Of these, only the order  statement  is  required.
       When  charmap  or  substitute  is  supplied,  these  statements must be
       ordered as  above.   Any  statements  after  the  order  statement  are
       ignored.

       Lines  in the specification file beginning with a # are treated as com-
       ments and are ignored.  Blank lines are also ignored.

       charmap charmapfile
              charmap defines where a mapping of the character  and  collating
              element  symbols  to the actual character encoding can be found.
              The charmapfile filename cannot be a keyword (for example,  sub-
              stitute,  order,  or with) or special symbols (for example, ...,
              ;, <&lt;, >&gt;, or ,).

              The format of charmapfile is shown below.  Symbol names are sep-
              arated  from  their  values by TAB or SPACE characters.  symbol-
              value can be specified in a hexadecimal (\x??)  or octal  (\???)
              representation, and can be only one character in length.

                   symbol-name1   symbol-value1
                   symbol-name2   symbol-value2
                   ...

              The following sample charmapfile maps the symbol names, c, h, H,
              and A-grave, to their respective symbol values.

                   c    \x63
                   h    \x68
                   H    \110
                   A-grave   \300

              The symbol names defined in charmapfile can  be  used  in  order
              statements by enclosing the symbol name in angle brackets, <&lt;sym-
              bol-name>&gt;.  For example,

                   order     (a, <&lt;A-grave>&gt;);b;<&lt;c>&gt;;...;<&lt;h>&gt;;<&lt;H>&gt;;i;...;z

              This statement is equivalent to,

                   order     (a, `);b;c;...;h;H;i;...;z

              Symbol names cannot be specified in substitute  fields.   Symbol
              names  also  cannot  be  combined with any other representation,
              such as, <&lt;c>&gt;h, c<&lt;h>&gt;, <&lt;c>&gt;\x68, or <&lt;c>&gt;<&lt;h>&gt;.  Symbol  names  can  be
              used  with  primary  and  secondary ordering as in the following
              example.

                   order  a;b;c;(<&lt;c>&gt;,<&lt;h>&gt;);d;...;z;\
                        A;...;G;{H,<&lt;H>&gt;};I;...;Z

              The charmap statement is optional.

       substitute char with repl

              The substitute statement substitutes the character char with the
              string repl.

              The  simple use of the substitute statement mentioned above sub-
              stituted a single character with two  characters,  as  with  the
              substitution of  with ss in German.

                   substitute "" with "ss"

              This  statement  can  also  be  used to specify characters to be
              ignored by mapping them to the null string.

                   substitute "m" with ""

              This is convenient for simplifying order statements.  When  used
              with  the  statement  below, the lower-case m is ignored -- even
              though it is implicitly included in the order statement.

                   order a;...;z

              Without the null string mapping statement above, this  would  be
              specified as,

                   order a;...;l;n;...;z

              The substitute statement is optional.

       order order_list

              order_list  is  a list of symbols, separated by semicolons, that
              defines the collating sequence.  The special symbol, ..., speci-
              fies,  in  a  short-hand  form,  symbols  that are sequential in
              machine code order.  The following example specifies the list of
              lower-case letters.

                   order a;b;c;d;...;x;y;z

              Of course, this could be further compressed to just a;...;z.

              A symbol can be up to two characters in length and can be repre-
              sented in any one of the following ways:

              o   The symbol itself (for example, a for the lower-case  letter
                  a).

              o   In  octal  representation  (for example, \141 for the letter
                  a).

              o   In hexadecimal representation (for  example,  \x61  for  the
                  letter a).

              o   The symbol name as defined in the charmap file.

              Any combination of these may be used as well.

              The  backslash  character, \, is used for continuation.  In this
              case, no characters are permitted after the backslash character.

              Symbols enclosed in parentheses are assigned  the  same  primary
              ordering  but different secondary ordering.  Symbols enclosed in
              curly brackets are assigned only the same primary ordering.  For
              example,

                   order a;b;c;ch;d;(e,`);f;...;z;\
                         {1,2,3,4,5,6,7,8,9};A;...;Z

              In  the  above  example,  e  and ` are assigned the same primary
              ordering and different secondary ordering, and digits 1  through
              9 are assigned the same primary ordering and no secondary order-
              ing.  Note that the ellipses cannot be  specified  within  curly
              brackets.   Only  primary  ordering is assigned to the remaining
              symbols.  Notice how double letters can be specified in the col-
              lating sequence (letter ch comes between c and d).

              If  a  character  is  not  included in the order statement it is
              excluded from the ordering and will be ignored during sorting.

EXAMPLES
       The following example shows the  collation  specification  required  to
       support a hypothetical telephone book sorting sequence.

       The sorting sequence is defined by the following rules:

       o   Upper  and  lower  case  letters must be sorted together, but upper
           case letters have precedence over lower case letters.

       o   All special characters and punctuation should be ignored.

       o   Digits must be sorted as their alphabetic counterparts  (for  exam-
           ple, 0 as zero, 1 as one).

       o   The CH, Ch, ch combinations must be collated between c and D.

       o   V and W, v and w must be collated together.

       The input specification file for this example contains:

                 substitute "0" with "zero"
                 substitute "1" with "one"
                 substitute "2" with "two"
                 substitute "3" with "three"
                 substitute "4" with "four"
                 substitute "5" with "five"
                 substitute "6" with "six"
                 substitute "7" with "seven"
                 substitute "8" with "eight"
                 substitute "9" with "nine"

                 order A;a;B;b;C;c;CH;Ch;ch;D;d;E;e;F;f;\
                       G;g;H;h:I;i;J;j;K;k;L;l;M;m;N;n;O;o;P;p;\
                       Q;q;R;r;S;s;T;t;U;u;{V,W};{v,w};X;x;Y;y;Z;z

EXIT STATUS
       colldef exits with the following values:

       0      No errors were found and the output was successfully created.

       >0     Errors were found.

FILES
       /etc/locale/LC_COLLATE/locale
                 standard  private  location  for  collation  orders under the
                 locale locale
       /usr/share/lib/locale/LC_COLLATE/locale
                 standard shared  location  for  collation  orders  under  the
                 locale locale
SEE ALSO
       memory(3), strcoll(3), string(3)
                                  30 May 1991                       COLLDEF(8)