unixdev.net


Switch to SpeakEasy.net DSL

The Modular Manual Browser

Home Page
Manual: (OSF1-V5.1-alpha)
Page:
Section:
Apropos / Subsearch:
optional field



dechanyu(5)							  dechanyu(5)



NAME

  dechanyu - A character encoding system (codeset) for Traditional Chinese

DESCRIPTION

  The DEC Hanyu	(dechanyu) codset consists of the following sets of charac-
  ters:

    +  ASCII

    +  The first and second character planes of	CNS11643-1986

    +  Digital Taiwan Supplemental Character Set (DTSCS)

    +  User-defined characters

  DEC Hanyu uses a combination of single-byte data, 2-byte data, and 4-byte
  data to represent ASCII characters, symbols, or ideographic characters.

  ASCII	characters


  All ASCII characters are represented in the form of single-byte, 7-bit data
  in DEC Hanyu;	that is, the most significant bit (MSB)	of a byte that
  represents an	ASCII character	is always set off. Refer to ascii(5) for more
  information about the	ASCII character	set.

  CNS11643-1986	Characters (Planes 1 and 2)


  Each plane of	the CNS	11643-1986 character set is divided into 94 rows and
  each of these	rows has 94 columns. The characters defined in plane 1 and
  plane	2 of CNS 11643-1986 are	as follows:

  ______________________________________________________________________
  Character Plane   Character Type

						     Number of Charac-
						     ters
  ______________________________________________________________________
  1		    Special characters		     651
		    Control characters		     33
		    Frequently used characters	     5401
  2						     7650

		    Less frequently used charac-
		    ters
  ______________________________________________________________________

  Note that the	first two planes of the	CNS11643-1986 character	set are	the
  same as those	specified for the revised CNS11643-1992	character set.

  In DEC Hanyu,	each CNS 11643-1986 character is represented by	two bytes, in
  conformance with the CNS 11643-1986 standard.	The MSB	of the first byte is
  always turned	on while that of the second byte is on for the first charac-
  ter plane and	off for	the second character plane.

  The first byte of CNS	11643-1986 encoding determines the row number of the
  character, while the second byte determines its column number. Code ranges
  for the two character	planes are as follows:

  Plane	1
      A1A1 to FEFE

  Plane	2
      A121 to FE7E

  The following	formulas determine the value of	a CNS 11643-1986 character in
  relation to its row and column numbers.

    +  For a CNS 11643-1986 Plane 1 character:

       1st byte	= A0(hex) + Row	number

       2nd byte	= A0(hex) + Column number

    +  For a CNS 11643-1986 Plane 2 character:

       1st byte	= A0(hex) + Row	number

       2nd byte	= 20(hex) + Column number

  For example, if a character is positioned at the first column	of the 36th
  row on CNS 11643 plane 1, its	value is C4A1, which is	calculated as fol-
  lows:

  1st byte = A0(hex) + 36 = C4(hex)
  2nd byte = A0(hex) + 01 = A1(hex)

  Similarly, if	a character is positioned at the first column of the 36th row
  on CNS 11643 plane 2,	its value is C421, which is calculated as follows:

  1st byte = A0(hex) + 36 = C4(hex)
  2nd byte = 20(hex) + 01 = 21(hex)

  DTSCS	Characters


  Currently, only the EDPC (Electronic Data Processing Centre) Recommended
  Character Set, which defines a total of 6319 characters (rows	1 to 68), is
  included in the Digital Taiwan Supplementary Character Set (DTSCS). In the
  revised CNS 11643-1992 standard, the 6319 characters in the EDPC Recom-
  mended Character Set are assigned to the third and fourth character planes
  as follows:

  ________________________________________________________
  EDPC Characters   Character Plane   Number of	Characters
  ________________________________________________________
  Part I	    Plane 3	      6148
  Part II	    Plane 4	      171
  ________________________________________________________

  The characters defined in Plane 3 and	Plane 4	of CNS 11643-1992 are as fol-
  lows:








  _________________________________________________________________________
  Character Plane   Character Type

							     Number of
							     Characters
  _________________________________________________________________________
  3		    Rarely-used	characters (EDPC Part I)     6148

  4							     7298

		    Used for residency system, ISO 2nd
		    edition DIS	10646 Han characters, 171
		    EDPC Part II Characters
  _________________________________________________________________________

  In DEC Hanyu,	each DTSCS character is	represented by a 4-byte	value.	The
  first	two bytes are the leading value, specifically C2CB, which is used as
  a designator sequence	for the	DTSCS character	set. The MSB of	the third and
  fourth bytes is set on for the EDPC Recommended Character Set.

  User-Defined Characters


  In addition to the two Chinese character sets	described in preceding sec-
  tions, DEC Hanyu provides an area of 3587 positions for user-defined char-
  acters (UDC).	The positions for UDC are those	positions that are unused
  (but not reserved) code points on the	first and second character planes of
  CNS 11643-1986.

  The encoding for UDC is exactly the same as that for CNS11643-1986 except
  that the two sets of characters occupy different regions. Code ranges	for
  UDC are as follows:

  ______________________________________________
  Character Plane   Number of UDC   Code Range
  ______________________________________________
  1		    145		    FDCC to FEFE
  1		    2256	    AAA1 to C1FE
  2		    1186	    F245 to FE7E
  ______________________________________________

  Codeset Conversion


  The following	codeset	converter pairs	are available for converting Tradi-
  tional Chinese characters between dechanyu and other encoding	formats.
  Refer	to iconv_intro(5) for an introduction to codeset conversion. For more
  information about the	other codeset for which	dechanyu is the	input or out-
  put, see the reference page specified	in the list item.

    +  big5_dechanyu, dechanyu_big5

       Converting from and to the Big-5	codeset: big5(5).

       Note that Big-5 encoding	is equivalent to the Microsoft code-page for-
       mat used	on PCs for Traditional Chinese.	See code_page(5) for informa-
       tion about PC code pages.

    +  dechanzi_dechanyu, dechanyu_dechanzi

       Converting from and to the DEC Hanzi codeset: dechanzi(5).

    +  eucTW_dechanyu, dechanyu_eucTW

       Converting from and to Taiwanese	Extended UNIX Code: eucTW(5).

    +  telecode_dechanyu, dechanyu_telecode

       Converting from and to the Telecode codeset: telecode(5).

    +  UCS-2_dechanyu, dechanyu_UCS-2

       Converting from and to UCS-2 format: Unicode(5).

    +  UCS-4_dechanyu, dechanyu_UCS-4

       Converting from and to UCS-4 format: Unicode(5).

    +  UTF-8_dechanyu, dechanyu_UTF-8

       Converting from and to UTF--8 format: Unicode(5).

  Fonts	for DEC	Hanyu Characters


  The operating	system provides	both screen and	printer	fonts for DEC Hanyu
  characters.

  The following	DECwindows Motif fonts are grouped according to	character set
  and family; they reflect various sizes and typefaces for 75dpi and 100dpi
  display devices:

  CNS 11643-1986 Fonts (Hei family):

       -adecw-hei-medium-r-normal--16-160-75-75-m-160-dec.cns11643.1986-2
       -adecw-hei-medium-r-normal--24-240-75-75-m-240-dec.cns11643.1986-2
       -adecw-hei-medium-r-normal--16-160-100-100-m-160-dec.cns11643.1986-2
       -adecw-hei-medium-r-normal--24-240-100-100-m-240-dec.cns11643.1986-2

  CNS 11643-1986 fonts (Screen family):

       -adecw-screen-medium-r-normal--18-180-75-75-m-160-dec.cns11643.1986-2
       -adecw-screen-medium-r-normal--24-240-75-75-m-240-dec.cns11643.1986-2
       -adecw-screen-medium-r-normal--18-180-100-100-m-160-dec.cns11643.1986-2
       -adecw-screen-medium-r-normal--24-240-100-100-m-240-dec.cns11643.1986-2
       -adecw-screen-medium-r-normal--18-180-100-100-m-160-dec.cns11643.1986-UDC
       -adecw-screen-medium-r-normal--24-240-100-100-m-240-dec.cns11643.1986-UDC

  CNS 11643-1986 fonts (Sung family):

       -adecw-sung-medium-r-normal--24-240-75-75-m-240-dec.cns11643.1986-2
       -adecw-sung-medium-r-normal--32-320-75-75-m-320-dec.cns11643.1986-2
       -adecw-sung-medium-r-normal--24-240-100-100-m-240-dec.cns11643.1986-2
       -adecw-sung-medium-r-normal--32-320-100-100-m-320-dec.cns11643.1986-2

  DTSCS	fonts (Hei family):

       -adecw-hei-medium-r-normal--16-160-75-75-m-160-dec.dtscs.1990-2
       -adecw-hei-medium-r-normal--24-240-75-75-m-240-dec.dtscs.1990-2
       -adecw-hei-medium-r-normal--16-160-100-100-m-160-dec.dtscs.1990-2
       -adecw-hei-medium-r-normal--24-240-100-100-m-240-dec.dtscs.1990-2

  DTSCS	fonts (Screen family):

       -adecw-screen-medium-r-normal--18-180-75-75-m-160-dec.dtscs.1990-2
       -adecw-screen-medium-r-normal--24-240-75-75-m-240-dec.dtscs.1990-2
       -adecw-screen-medium-r-normal--18-180-100-100-m-160-dec.dtscs.1990-2
       -adecw-screen-medium-r-normal--24-240-100-100-m-240-dec.dtscs.1990-2

  DTSCS	fonts (Sung family):

       -adecw-sung-medium-r-normal--24-240-75-75-m-240-dec.dtscs.1990-2
       -adecw-sung-medium-r-normal--32-320-75-75-m-320-dec.dtscs.1990-2
       -adecw-sung-medium-r-normal--24-240-100-100-m-240-dec.dtscs.1990-2
       -adecw-sung-medium-r-normal--32-320-100-100-m-320-dec.dtscs.1990-2

  The operating	system provides	the following PostScript printer fonts for
  CNS 11643-1986 characters:

    +  Hei-Light-CNS11643

    +  Sung-Light-CNS11643

  These	PostScript fonts support only the Traditional Chinese characters in
  planes 1 and 2 of the	CNS 11643 character set. The Traditional Chinese
  characters in	the DTSCS character set	are not	supported by printer fonts.
  The restriction also applies to the eucTW codeset, which also	includes
  DTSCS	characters and is supported by the same	fonts as dechanyu.

  For general information on printing Asian language text, refer to
  i18n_printing(5).

SEE ALSO

  Commands: locale(1)

  Others: ascii(5), big5(5), Chinese(5), code_page(5), dechanzi(5), eucTW(5),
  GBK(5), i18n_intro(5), i18n_printing(5), iconv_intro(5), l10n_intro(5),
  sbig5(5), telecode(5)