unixdev.net


Switch to SpeakEasy.net DSL

The Modular Manual Browser

Home Page
Manual: (OSF1-V5.1-alpha)
Page:
Section:
Apropos / Subsearch:
optional field



eucTW(5)							     eucTW(5)



NAME

  eucTW	- A character encoding system (codeset)	for Traditional	Chinese

DESCRIPTION

  The Taiwanese	EUC (Extended UNIX Code), or eucTW, codeset consists of	the
  following character sets:

    +  ASCII

    +  CNS 11643 (Plane	1 to Plane 16)

  Taiwanese EUC	uses a combination of single-byte data and 2-byte data to
  represent ASCII characters, symbols, and ideographic characters. Because
  too many character planes were included, Taiwanese EUC uses different	lead-
  ing codes to designate different character planes.

  ASCII	characters are represented in the form of single byte 7-bit data in
  Taiwanese EUC; that is, the most significant bit (MSB) of the	byte that
  represents an	ASCII character	is always set off. For more information,
  refer	to ascii(5).

  Although the standard	Taiwanese EUC codeset includes all characters defined
  by the CNS 11643-1992	standard, the operating	system's eucTW implementation
  currently supports the following:

    +  Characters defined in the first and second planes of CNS	11643

    +  The EDPC	Recommended Character Set (refer to dechanyu(5)	for more
       information)

    +  CNS 11643-1986 and DTSCS	characters that	have been remapped into	the
       third and fourth	character planes by the	CNS 11643-1992 standard

  Characters that were added to	CNS 11643-1986 by the CNS 11643-1992 standard
  are not supported.

  The characters that are defined in plane 1 and plane 2 of CNS	11643-1992
  and that are the same	as those defined in CNS	11643-1986 are as follows:

  ___________________________________________________________________
  Character Plane   Character Type		 Number	of Characters
  ___________________________________________________________________
  1		    Special characters		 651
		    Control characters		 33
		    Frequently-used characters	 5401
  2						 7650

		    Less frequently-used char-
		    acters
  ___________________________________________________________________

  The characters defined in plane 3 and	plane 4	of CNS 11643-1992 are as fol-
  lows:






  _________________________________________________________________________
  Character Plane   Character Type

							     Number of
							     Characters
  _________________________________________________________________________
  3		    Rarely-used	characters (EDPC Part I)     6148
  4							     7298

		    Used for residency system, ISO 2nd
		    edition DIS	10646 Han characters, 171
		    EDPC Part II Characters
  _________________________________________________________________________

  The characters that have been	remapped into the third	and fourth character
  planes of CNS	11643-1992 as specified	by the EDPC are	as follows:

  ________________________________________________________
  EDPC Characters   Character Plane   Number of	Characters
  ________________________________________________________
  Part I	    Plane 3	      6148
  Part II	    Plane 4	      171
  ________________________________________________________

  Taiwanese EUC	Encoding


  Except for characters	in the first plane of CNS 11643-1986, Taiwanese	EUC
  makes	use of a leading code (the 8-bit Single-Shift 2	control	character
  (SS2)	and an additional byte)	to designate characters	to a character plane.

  The position of a character on a plane is specified by two bytes. The	first
  byte determines the character's row number and the second byte determines
  the character's column number. The MSB of both bytes is set on.

  The following	table shows the	encoding of Taiwanese EUC characters:

  ______________________________________________________
  CNS 11643-1986 Code Plane   Leading Code   Code Range
  ______________________________________________________
  1			      [nil]	     A1A1 - FEFE
  2			      SS2 A2	     A1A1 - FEFE
  3			      SS2 A3	     A1A1 - FEFE
  4			      SS2 A4	     A1A1 - FEFE
  5			      SS2 A5	     A1A1 - FEFE
  6			      SS2 A6	     A1A1 - FEFE
  7			      SS2 A7	     A1A1 - FEFE
  8			      SS2 A8	     A1A1 - FEFE
  9			      SS2 A9	     A1A1 - FEFE
  10			      SS2 AA	     A1A1 - FEFE
  11			      SS2 AB	     A1A1 - FEFE
  12			      SS2 AC	     A1A1 - FEFE
  13			      SS2 AD	     A1A1 - FEFE
  14			      SS2 AE	     A1A1 - FEFE
  15			      SS2 AF	     A1A1 - FEFE
  16			      SS2 B0	     A1A1 - FEFE
  ______________________________________________________








  Codeset Conversion


  The following	codeset	converter pairs	are available for converting Tradi-
  tional Chinese characters between eucTW and other encoding formats.  Refer
  to iconv_intro(5) for	an introduction	to codeset conversion. For more
  information about the	other codeset for which	eucTW is the input or output,
  see the reference page specified in the list item.

    +  big5_eucTW, eucTW_big5

       Converting from and to the Big-5	codeset: big5(5).

       Note that Big-5 encoding	is equivalent to the Microsoft code-page for-
       mat used	on PCs for Traditional Chinese.	You can	therefore use this
       set of converters to convert Traditional	Chinese	text between the
       eucTW and PC code-page formats. For information about how the operat-
       ing system supports PC code pages, see code_page(5).

    +  dechanyu_eucTW, eucTW_dechanyu

       Converting from and to the DEC Hanyu codeset: dechanyu(5).

    +  dechanzi_eucTW, eucTW_dechanzi

       Converting from and to the DEC Hanzi codeset: dechanzi(5).

    +  sbig5_eucTW, eucTW_sbig5

       Converting from and to the Shift	Big-5 codeset: sbig5(5).

    +  telecode_eucTW, eucTW_telecode

       Converting from and to the Telecode codeset: telecode(5).

    +  UCS-2_eucTW, eucTW_UCS-2

       Converting from and to UCS-2 format: Unicode(5).

    +  UCS-4_eucTW, eucTW_UCS-4

       Converting from and to UCS-4 format: Unicode(5).

    +  UTF-8_eucTW, eucTW_UTF-8

       Converting from and to UTF--8 format: Unicode(5).

  Fonts	for Taiwanese EUC


  For both display devices and printers, the operating system supports
  Taiwanese EUC	through	internal conversion to DEC Hanyu code and use of DEC
  Hanyu	fonts (see dechanyu(5)).

  For general information on printing non-English text,	refer to
  i18n_printing(5).

SEE ALSO

  Commands: locale(1)

  Others: ascii(5), big5(5), Chinese(5), code_page(5), dechanzi(5), GBK(5),
  iconv_intro(5), i18n_intro(5), i18n_printing(5), l10n_intro(5), sbig5(5),
  telecode(5), Unicode(5)