unixdev.net


Switch to SpeakEasy.net DSL

The Modular Manual Browser

Home Page
Manual: (OSF1-V5.1-alpha)
Page:
Section:
Apropos / Subsearch:
optional field



Wototo(5)							    Wototo(5)



NAME

  Wototo, wototo - Introduction	to the Thai language standard

DESCRIPTION

  Wototo is the	Thai language software standard. It describes Thai characters
  and their classifications.  This standard also describes the methods used
  to input and output Thai characters.

  Thai Character Sets


  The following	two character sets are defined for the Thai language:

    +  Basic character set

    +  Auxiliary character set

  In the basic character set, characters are 8-bit coded and have values from
  0 to 255. Character values correspond	to the characters defined in stan-
  dards	as follows:

    +  Values 0	to 7F correspond to characters from the	ISO 646-1983 stan-
       dard.

    +  Values A1 to FB (except for DB, DD and DE) correspond to	characters
       from the	TIS 620-2533 standard.

    +  Remaining values	are reserved for future	use.

  The encoded form of the basic	character set is called	the the	TACTIS
  codeset, which is discussed in the TACTIS(5) reference page.

  Characters in	the auxiliary character	set use	the code values	32 to 126 and
  161 to 254 only. The Wototo standard specifies that implementations provide
  at least one auxiliary character set.

  Character Classification


  In the TACTIS	codeset, characters are	organized into different classes.
  This classification is done only to facilitate processing is not related to
  Thai linguistic or grammatical rules.	The codeset contains the following
  character classes:

  Control characters (CTRL)
      Nondisplayable characters	that are used for controlling output or	data
      communication. The sixty-six control character values are: 00 to 1F,
      7F, 80 to	9F, and	FF.

  Consonants (CONS)
      The Thai consonants as defined in	TIS 620-2533.

  Vowels (*V)

      Leading Vowels (LV)
	  The five leading vowels as defined in	TIS 620-2533.

      Following	Vowels (FV)
	  The six following vowels as defined in TIS 620-2533.

      Below Vowels (BV)
	  The two below	vowels as defined in TIS 620-2533.

      Above Vowels (AV)
	  The five above vowels	as defined in TIS 620-2533.

      Tone Marks (TONE)
	  The four tone	marks as defined in TIS	620-2533.

  Diacritics (*D)

      Above Diacritics (AD)
	  The four above diacritics as defined in TIS 620-2533.

      Below Diacritic (BD)
	  The below diacritic as defined in TIS	620-2533.

  Non-Composibles (NON)
      Those characters that do not fit into preceding five character classes.
      This group includes 119 characters that users cannot compose with	above
      vowels, below vowels, tone marks,	and above and below diacritics.	Non-
      composible characters are	divided	into the following seven groups:

	+  Graphic Characters

	   The 94 graphic defined in ISO 646-1983. These include:

	     --	52 English alphabetic characters

	     --	10 digits

	     --	32 special characters whose values are 21 to 2F, 3A to 3F,
		and 7B to 7E

	+  Space

	   Character code value	is 20.

	+  Nobreak space

	   Character code value	is A0.

	+  Thai	digits

	   The 10 Thai digits as defined in TIS	620-2533.

	+  Thai	special	characters

	   The 6 Thai special characters as defined in TIS 620-2533.

	+  Word	separator

	   The word separator as defined in TIS	620-2533.

	+  Reserved code points

	   6 code points reserved for future use.

  To better describe Thai input	and output methods, characters in the classes
  FV, BV, AV, and AD are further divided into subclasses. The following	list
  describes character classes and subclasses by	the number of characters in
  the class and	their encoded values:

  CTRL
      Number: 66

      Values: 00 to 1F,	7F, 80 to 9F, and FF

  NON Number: 119

      Values:

      20 to 7E (ISO 646-1983 character codes)

      A0, CF, DC, DF, E6, EF, F0 to F9,	FA, and	FB (TIS	620-2533 character
      codes)

      DB, DD, DE FC, FD, and FE	(Reserved code points)

  CONS
      Number: 44

      Values: A1 to C3,	C5, and	C7 to CE

  LV  Number: 5

      Values: E0, E1, E2, E3, and E4

  FV1 Number: 3

      Values: D0, D2, and D3

  FV2 Number: 1

      Value: E5

  FV3 Number: 2

      Values: C4 and C6

      These two	characters also	behave as leading vowels (LV) in the charac-
      ter sequence LV+CONS.

  BV1 Number: 1

      Value: D8

  BV2 Number: 1

      Value: D9

  BD  Number: 1

      Value: DA

  TONE
      Number: 4

      Values: E8, E9, EA, and EB

  AD1 Number: 2

      Values: ED and EC

  AD2 Number: 1

      Value: E7

  AD3 Number: 1

      Value: EE

  AV1 Number: 1

      Value: D4

  AV2 Number: 2

      Values: D1 and D6

  AV3 Number: 2

      Values: D5 and D7

  Character Levels


  Thai characters are classified according to different	display	levels (rela-
  tive to baseline and nondisplayable).	Classification by display levels
  facilitates the character input procedures. There are	five character clas-
  sification levels. Four levels include displayable characters	and one	level
  includes nondisplayable characters, as follows:

    +  Nondisplayable level

       Includes	all control characters in the CTRL class.

    +  Base level

       Includes	all characters in the NON, CONS, FV, and LV classes. Charac-
       ters at this level are drawn on baseline.

    +  Above level

       Includes	all characters in the AD3, AV1,	AV2, and AV3 classes. Charac-
       ters at this level are drawn immediately	above final consonants.

    +  Below level

       Includes	all characters in the BV1, BV2,	and BD classes.	Characters at
       this level are drawn immediately	below final consonants.

    +  Top level

       Includes	all characters in the TONE, AD1, and AD2 classes. Characters
       at this level are drawn on top of the characters	at the above level.
       If above	level characters do not	exist, top level characters are	drawn
       at the above level. Characters at this level also indicate the end of
       character cells.

  The standard specifies that the properties of	Thai characters	can be tested
  by using the following functions.







				     Note

       These functions are not implemented in Tru64 UNIX.

  int TACchlevel(unsigned char ch);
      Determines the character level class that	the character belongs to and
      returns the numeric value	0, 1, 2, 3, or 4.  These return	values can be
      represented by the constants NONDISP, TOP, ABOVE,	BASE, or BELOW,
      respectively.

  TACisalpha(ch);
      Returns TRUE if a	character is alphabetic.

  TACisalnum(ch);
      Returns TRUE if a	character is either alphabetic or a digit.

  TACiscntrl(ch);
      Returns TRUE if a	character belongs to the CTRL class.

  TACisdigit(ch);
      Returns TRUE if the character is a digit.

  TACisgraph(ch);
      Returns TRUE if the character is not in the NONDISP level	class.

  TACislower(ch);
      Returns TRUE if the character is an English lowercase letter (a to z).

  TACisupper(ch);
      Returns TRUE if the character is an English uppercase letter (A to Z).

  TACisprint(ch);
      Returns TRUE if a	character is not in the	NONDISP	level class.

  TACisspace(ch);
      Returns TRUE if the character is a space,	formfeed, newline, return,
      tab, vertical tab, or wordbreak character.

  TACisxdigit(ch);
      Returns TRUE if the character is a hexadecimal digit 0 to	9, A to	F, or
      a	to f. (Thai digits are excluded.)

  Thai Input Methods


  The input method for Thai characters directly	maps characters	to keys, as
  for English. Thai character sequences	are entered character by character
  and display from left	to right, regardless of	whether	the sequence includes
  forward characters (characters in the	NON, CONS, LV, FV1, FV2, FV3 classes)
  or dead characters (characters in all	other classes).	However, the follow-
  ing basic rules apply	to the character input sequence:

    +  Every display cell must begin with a character on the baseline (in the
       BASE class).

    +  A character in the BASE class that is also in the CONS class may	be
       followed	by an above vowel, a below vowel, a tone mark, a below
       diacritic, or an	above diacritic.

  For more detailed rules about	input sequence rules, refer to the Draft
  Industrial Standard -	Thai Language Software Standard	WTT2.0 (Part 2:	Thai
  Input	and Output Methods)





SEE ALSO

  Commands: locale(1)


  Others: i18n_intro(5), i18n_printing(5), l10n_intro(5), TACTIS(5), Thai(5)