COLLDEF(8) System Manager's Manual COLLDEF(8)
colldef - convert collation sequence source definition
colldef converts a collation sequence source definition into a format
usable by the strxfrm() and strcoll(3) functions. It is used to define
the many ways in which strings can be ordered and collated. strxfrm()
transforms its first argument and places the result in its second argu-
ment. The transformed string is such that it can be correctly ordered
with other transformed strings by using strcmp(), strncmp(), or mem-
cmp() (see string(3) and memory(3)). strcoll(3) transforms its argu-
ments and does a comparison.
colldef reads the collation sequence source definition from the stan-
dard input and stores the converted definition in filename. The output
file produced contains the database with collating sequence information
in a form usable by system commands and routines.
The collation sequence definition specifies a set of collating elements
and the rules defining how strings containing these should be ordered.
This is most useful for different language definitions.
The colldef command can support languages whose mapping and collating
sequences can be described by the following cases:
o Ordering of single characters within the codeset. For example, in
Swedish, V is sorted after U, before X and with W (V and W are con-
sidered identical as far as sorting is concerned).
o Equivalence class definition. A collection of characters is
defined to have the same primary sorting value.
o Ordering of "double characters" in the collation sequence. For
example, in Spanish, ch and ll are collated after c and l, respec-
o Ordering of a single character as if it consists of two characters.
For example, in German, the "sharp s", , is sorted as ss. This is
a special instance of the next case below.
o Substitution of one character with a character string, that is, a
one-to-many mapping. In the example above, the character is
replaced with ss during sorting.
o Ignoring certain characters in the codeset during collation. For
example, if `-' is not specified in the specification table, then
the strings re-locate and relocate are equal.
o Null character mapping. A character is mapped to a null collating
element, and is ignored in sorting sequences.
o Secondary ordering between characters. In the case where two char-
acters are sorted together in the collation sequence, (for example,
they have the same "primary" ordering), there is sometimes a sec-
ondary ordering that is used if two strings are identical except
for characters that have the same primary ordering. For example,
in French, the letters e and ` have the same primary ordering but e
comes before ` in the secondary ordering. Thus the word lever
would be ordered before l`ver, but l`ver would be sorted before
levitate. Note: if e came before ` in the primary ordering, then
l`ver would be sorted after levitate.
The specification file can consist of three statements: charmap, sub-
stitute, and order. Of these, only the order statement is required.
When charmap or substitute is supplied, these statements must be
ordered as above. Any statements after the order statement are
Lines in the specification file beginning with a # are treated as com-
ments and are ignored. Blank lines are also ignored.
charmap defines where a mapping of the character and collating
element symbols to the actual character encoding can be found.
The charmapfile filename cannot be a keyword (for example, sub-
stitute, order, or with) or special symbols (for example, ...,
;, <<, >>, or ,).
The format of charmapfile is shown below. Symbol names are sep-
arated from their values by TAB or SPACE characters. symbol-
value can be specified in a hexadecimal (\x??) or octal (\???)
representation, and can be only one character in length.
The following sample charmapfile maps the symbol names, c, h, H,
and A-grave, to their respective symbol values.
The symbol names defined in charmapfile can be used in order
statements by enclosing the symbol name in angle brackets, <<sym-
bol-name>>. For example,
order (a, <<A-grave>>);b;<<c>>;...;<<h>>;<<H>>;i;...;z
This statement is equivalent to,
order (a, `);b;c;...;h;H;i;...;z
Symbol names cannot be specified in substitute fields. Symbol
names also cannot be combined with any other representation,
such as, <<c>>h, c<<h>>, <<c>>\x68, or <<c>><<h>>. Symbol names can be
used with primary and secondary ordering as in the following
The charmap statement is optional.
substitute char with repl
The substitute statement substitutes the character char with the
The simple use of the substitute statement mentioned above sub-
stituted a single character with two characters, as with the
substitution of with ss in German.
substitute "" with "ss"
This statement can also be used to specify characters to be
ignored by mapping them to the null string.
substitute "m" with ""
This is convenient for simplifying order statements. When used
with the statement below, the lower-case m is ignored -- even
though it is implicitly included in the order statement.
Without the null string mapping statement above, this would be
The substitute statement is optional.
order_list is a list of symbols, separated by semicolons, that
defines the collating sequence. The special symbol, ..., speci-
fies, in a short-hand form, symbols that are sequential in
machine code order. The following example specifies the list of
Of course, this could be further compressed to just a;...;z.
A symbol can be up to two characters in length and can be repre-
sented in any one of the following ways:
o The symbol itself (for example, a for the lower-case letter
o In octal representation (for example, \141 for the letter
o In hexadecimal representation (for example, \x61 for the
o The symbol name as defined in the charmap file.
Any combination of these may be used as well.
The backslash character, \, is used for continuation. In this
case, no characters are permitted after the backslash character.
Symbols enclosed in parentheses are assigned the same primary
ordering but different secondary ordering. Symbols enclosed in
curly brackets are assigned only the same primary ordering. For
In the above example, e and ` are assigned the same primary
ordering and different secondary ordering, and digits 1 through
9 are assigned the same primary ordering and no secondary order-
ing. Note that the ellipses cannot be specified within curly
brackets. Only primary ordering is assigned to the remaining
symbols. Notice how double letters can be specified in the col-
lating sequence (letter ch comes between c and d).
If a character is not included in the order statement it is
excluded from the ordering and will be ignored during sorting.
The following example shows the collation specification required to
support a hypothetical telephone book sorting sequence.
The sorting sequence is defined by the following rules:
o Upper and lower case letters must be sorted together, but upper
case letters have precedence over lower case letters.
o All special characters and punctuation should be ignored.
o Digits must be sorted as their alphabetic counterparts (for exam-
ple, 0 as zero, 1 as one).
o The CH, Ch, ch combinations must be collated between c and D.
o V and W, v and w must be collated together.
The input specification file for this example contains:
substitute "0" with "zero"
substitute "1" with "one"
substitute "2" with "two"
substitute "3" with "three"
substitute "4" with "four"
substitute "5" with "five"
substitute "6" with "six"
substitute "7" with "seven"
substitute "8" with "eight"
substitute "9" with "nine"
colldef exits with the following values:
0 No errors were found and the output was successfully created.
>0 Errors were found.
standard private location for collation orders under the
standard shared location for collation orders under the
memory(3), strcoll(3), string(3)
30 May 1991 COLLDEF(8)