Wototo, wototo - Introduction to the Thai language standard
Wototo is the Thai language software standard. It describes Thai characters
and their classifications. This standard also describes the methods used
to input and output Thai characters.
Thai Character Sets
The following two character sets are defined for the Thai language:
+ Basic character set
+ Auxiliary character set
In the basic character set, characters are 8-bit coded and have values from
0 to 255. Character values correspond to the characters defined in stan-
dards as follows:
+ Values 0 to 7F correspond to characters from the ISO 646-1983 stan-
+ Values A1 to FB (except for DB, DD and DE) correspond to characters
from the TIS 620-2533 standard.
+ Remaining values are reserved for future use.
The encoded form of the basic character set is called the the TACTIS
codeset, which is discussed in the TACTIS(5) reference page.
Characters in the auxiliary character set use the code values 32 to 126 and
161 to 254 only. The Wototo standard specifies that implementations provide
at least one auxiliary character set.
In the TACTIS codeset, characters are organized into different classes.
This classification is done only to facilitate processing is not related to
Thai linguistic or grammatical rules. The codeset contains the following
Control characters (CTRL)
Nondisplayable characters that are used for controlling output or data
communication. The sixty-six control character values are: 00 to 1F,
7F, 80 to 9F, and FF.
The Thai consonants as defined in TIS 620-2533.
Leading Vowels (LV)
The five leading vowels as defined in TIS 620-2533.
Following Vowels (FV)
The six following vowels as defined in TIS 620-2533.
Below Vowels (BV)
The two below vowels as defined in TIS 620-2533.
Above Vowels (AV)
The five above vowels as defined in TIS 620-2533.
Tone Marks (TONE)
The four tone marks as defined in TIS 620-2533.
Above Diacritics (AD)
The four above diacritics as defined in TIS 620-2533.
Below Diacritic (BD)
The below diacritic as defined in TIS 620-2533.
Those characters that do not fit into preceding five character classes.
This group includes 119 characters that users cannot compose with above
vowels, below vowels, tone marks, and above and below diacritics. Non-
composible characters are divided into the following seven groups:
+ Graphic Characters
The 94 graphic defined in ISO 646-1983. These include:
-- 52 English alphabetic characters
-- 10 digits
-- 32 special characters whose values are 21 to 2F, 3A to 3F,
and 7B to 7E
Character code value is 20.
+ Nobreak space
Character code value is A0.
+ Thai digits
The 10 Thai digits as defined in TIS 620-2533.
+ Thai special characters
The 6 Thai special characters as defined in TIS 620-2533.
+ Word separator
The word separator as defined in TIS 620-2533.
+ Reserved code points
6 code points reserved for future use.
To better describe Thai input and output methods, characters in the classes
FV, BV, AV, and AD are further divided into subclasses. The following list
describes character classes and subclasses by the number of characters in
the class and their encoded values:
Values: 00 to 1F, 7F, 80 to 9F, and FF
NON Number: 119
20 to 7E (ISO 646-1983 character codes)
A0, CF, DC, DF, E6, EF, F0 to F9, FA, and FB (TIS 620-2533 character
DB, DD, DE FC, FD, and FE (Reserved code points)
Values: A1 to C3, C5, and C7 to CE
LV Number: 5
Values: E0, E1, E2, E3, and E4
FV1 Number: 3
Values: D0, D2, and D3
FV2 Number: 1
FV3 Number: 2
Values: C4 and C6
These two characters also behave as leading vowels (LV) in the charac-
ter sequence LV+CONS.
BV1 Number: 1
BV2 Number: 1
BD Number: 1
Values: E8, E9, EA, and EB
AD1 Number: 2
Values: ED and EC
AD2 Number: 1
AD3 Number: 1
AV1 Number: 1
AV2 Number: 2
Values: D1 and D6
AV3 Number: 2
Values: D5 and D7
Thai characters are classified according to different display levels (rela-
tive to baseline and nondisplayable). Classification by display levels
facilitates the character input procedures. There are five character clas-
sification levels. Four levels include displayable characters and one level
includes nondisplayable characters, as follows:
+ Nondisplayable level
Includes all control characters in the CTRL class.
+ Base level
Includes all characters in the NON, CONS, FV, and LV classes. Charac-
ters at this level are drawn on baseline.
+ Above level
Includes all characters in the AD3, AV1, AV2, and AV3 classes. Charac-
ters at this level are drawn immediately above final consonants.
+ Below level
Includes all characters in the BV1, BV2, and BD classes. Characters at
this level are drawn immediately below final consonants.
+ Top level
Includes all characters in the TONE, AD1, and AD2 classes. Characters
at this level are drawn on top of the characters at the above level.
If above level characters do not exist, top level characters are drawn
at the above level. Characters at this level also indicate the end of
The standard specifies that the properties of Thai characters can be tested
by using the following functions.
These functions are not implemented in Tru64 UNIX.
int TACchlevel(unsigned char ch);
Determines the character level class that the character belongs to and
returns the numeric value 0, 1, 2, 3, or 4. These return values can be
represented by the constants NONDISP, TOP, ABOVE, BASE, or BELOW,
Returns TRUE if a character is alphabetic.
Returns TRUE if a character is either alphabetic or a digit.
Returns TRUE if a character belongs to the CTRL class.
Returns TRUE if the character is a digit.
Returns TRUE if the character is not in the NONDISP level class.
Returns TRUE if the character is an English lowercase letter (a to z).
Returns TRUE if the character is an English uppercase letter (A to Z).
Returns TRUE if a character is not in the NONDISP level class.
Returns TRUE if the character is a space, formfeed, newline, return,
tab, vertical tab, or wordbreak character.
Returns TRUE if the character is a hexadecimal digit 0 to 9, A to F, or
a to f. (Thai digits are excluded.)
Thai Input Methods
The input method for Thai characters directly maps characters to keys, as
for English. Thai character sequences are entered character by character
and display from left to right, regardless of whether the sequence includes
forward characters (characters in the NON, CONS, LV, FV1, FV2, FV3 classes)
or dead characters (characters in all other classes). However, the follow-
ing basic rules apply to the character input sequence:
+ Every display cell must begin with a character on the baseline (in the
+ A character in the BASE class that is also in the CONS class may be
followed by an above vowel, a below vowel, a tone mark, a below
diacritic, or an above diacritic.
For more detailed rules about input sequence rules, refer to the Draft
Industrial Standard - Thai Language Software Standard WTT2.0 (Part 2: Thai
Input and Output Methods)
Others: i18n_intro(5), i18n_printing(5), l10n_intro(5), TACTIS(5), Thai(5)