unixdev.net


Switch to SpeakEasy.net DSL

The Modular Manual Browser

Home Page
Manual: (OSF1-V5.1-alpha)
Page:
Section:
Apropos / Subsearch:
optional field



dechanzi(5)							  dechanzi(5)



NAME

  dechanzi - A character encoding system (codeset) for Simplified Chinese

DESCRIPTION

  The DEC Hanzi	(dechanzi) codeset consists of the following character sets:

    +  ASCII

    +  GB2312-80

    +  Extended	GB

  DEC Hanzi uses a 2-byte data representation for symbols and ideographic
  characters that are defined in GB2312-80.

  ASCII	Characters


  All ASCII characters are represented in the form of single-byte, 7-bit data
  in the DEC Hanzi codeset; that is, the most significant bit (MSB) of the
  byte that represents an ASCII	character is always set	off. For more infor-
  mation on ASCII characters, refer to ascii(5).

  GB2312-80 Characters


  The code table for GB2312-80 characters is divided into 94 rows(Qu), num-
  bered	from 1 to 94. Each row has 94 columns(Wei), also numbered from 1 to
  94. The code table defines a total of	7445 characters, of which 6763 are
  Chinese characters. Chinese characters are grouped as	follows:

    +  Graphic symbols

       There are 682 graphic symbols, which occupy rows	1 to 9 in the code
       table.

    +  Frequently used (Level 1) characters

       There are 3755 frequently used characters, which	occupy rows 16 to 55
       in the code table.

    +  Less frequently used (Level 2) characters

       There are 3008 less frequently used characters, which occupy rows 56-
       87 in the code table.

  To differentiate GB2312-80 character codes from ASCII	and Extended GB	char-
  acter	codes, the most	significant bit	(MSB) of both the first	byte and the
  second byte are set on. The following	formulas show how to calculate the
  value	for a GB2312-80	character from its row and column numbers:

  1st byte = A0	+ Row number
  2nd byte = A0	+ Column number

  For example, if a GB2312-80 character	is in the first	column of the 16th
  row, the character's value is	B0A1, which is calculated as follows:

  1st byte = A0(hex) + 16 = B0(hex)
  2nd byte = A0(hex) + 01 = A1(hex)

  Extended GB Characters


  The Extended GB code table is	similar	to the GB2312 code table and is
  divided into 94 rows and 94 columns (8894 code points). However, the
  Extended GB code table provides code points for user-defined characters
  (UDC). The 8836 code points in this table are	divided	into two areas:

    +  User-defined area

       This area spans rows 1 to 87 and	provides 8178 code points.

    +  User-defined (reserved) area

       This area spans rows 88 to 94 and provides 658 code points. This	area
       is where	users can define special and long-lasting user-defined char-
       acters.

  To differentiate Extended GB codes from ASCII	codes and GB2312-80 codes,
  the most significant bit (MSB) of the	first byte is set on while that	of
  the second byte is set off. The following formulas show how the code value
  of an	Extended GB character is calculated from its row and column numbers:

  1st byte = A0	+ Row number
  2nd byte = 20	+ Column number

  For example, if a character is positioned at the first column	of the 16th
  row on the GB2312-80 code plane, the character's value is B021, which	is
  calculated as	follows:

  1st byte = A0(hex) + 16 = B0(hex)
  2nd byte = 20(hex) + 01 = 21(hex)

  Codeset Conversion


  The following	codeset	converter pairs	are available for converting Simpli-
  fied Chinese characters between dechanzi and other encoding formats. Refer
  to iconv_intro(5) for	an introduction	to codeset conversion. For more
  information about the	other codeset for which	dechanzi is the	input or out-
  put, see the reference page specified	in the list item.

    +  big5_dechanzi, dechanzi_big5

       Converting from and to the Big-5	codeset: big5(5)

    +  dechanyu_dechanzi, dechanzi_dechanyu

       Converting from and to the DEC Hanyu codeset: dechanyu(5)

    +  eucTW_dechanzi, dechanzi_eucTW

       Converting from and to Taiwanese	Extended UNIX Code: eucTW(5)

    +  UCS-2_dechanzi, dechanzi_UCS-2

       Converting from and to UCS-2 format: Unicode(5)

    +  UCS-4_dechanzi, dechanzi_UCS-4

       Converting from and to UCS-4 format: Unicode(5)

    +  UTF-8_dechanzi, dechanzi_UTF-8

       Converting from and to UTF-8 format: Unicode(5)

  DEC Hanzi encoding is	identical to the Microsoft code-page format (cp936)
  used for Simplified Chinese characters on PC systems.	However, DEC Hanzi
  supports fewer characters than supported by the code page. Therefore,	using
  converters with dechanzi in the converter name to convert between cp936 and
  other	formats	can result in some data	loss. Refer to code_page(5) for	more
  information about PC code pages.

  DEC Hanzi Fonts


  The operating	system provides	both screen and	printer	fonts for DEC Hanzi
  characters. The operating system also	provides bit map fonts in addition to
  the TrueType fonts described in this section.	For a complete description of
  DEC Hanzi fonts, see the document, Technical Reference for Using Chinese
  Features.

  The following	set of Simplified Chinese TrueType fonts are installed as the
  operating system default fonts for DEC Hanzi:

  FangSong
	       -css_dongwen-fangsong-medium-r-normal--0-0-0-0-c-0-gb2312.1980-0
	       -css_dongwen-fangsong-medium-r-normal--0-0-0-0-c-0-gb2312.1980-1
	       -css_dongwen-fangsong-medium-r-normal--0-0-0-0-c-0-iso8859-1


  HeiTi
	       -css_dongwen-heiti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-0
	       -css_dongwen-heiti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-1
	       -css_dongwen-heiti-medium-r-normal--0-0-0-0-c-0-iso8859-1


  KaiTi
	       -css_dongwen-kaiti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-0
	       -css_dongwen-kaiti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-1
	       -css_dongwen-kaiti-medium-r-normal--0-0-0-0-c-0-iso8859-1


  SongTi
	       -css_dongwen-songti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-0
	       -css_dongwen-songti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-1
	       -css_dongwen-songti-medium-r-normal--0-0-0-0-c-0-iso8859-1




  The following	set of Simplified Chinese TrueType fonts are available as an
  installation option:

  FangSong
	       -huatian-fangsong-medium-r-normal--0-0-0-0-c-0-gb2312.1980-0
	       -huatian-fangsong-medium-r-normal--0-0-0-0-c-0-gb2312.1980-1
	       -huatian-fangsong-medium-r-normal--0-0-0-0-m-0-iso8859-1


  HeiTi
	       -huatian-heiti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-0
	       -huatian-heiti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-1
	       -huatian-heiti-medium-r-normal--0-0-0-0-m-0-iso8859-1


  KaiTi
	       -huatian-kaiti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-0
	       -huatian-kaiti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-1
	       -huatian-kaiti-medium-r-normal--0-0-0-0-m-0-iso8859-1


  SongTi
	       -huatian-songti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-0
	       -huatian-songti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-1
	       -huatian-songti-medium-r-normal--0-0-0-0-m-0-iso8859-1



  With either the default or optional font sets	installed, the SongTi fonts
  are the default screen fonts for the DEC Hanzi codeset.

  The operating	system provides	the following PostScript printer fonts for
  DEC Hanzi characters:

    +  Hei-GB2312-80

    +  XiSong-GB2312-80

  For general information on printing Asian language text, refer to
  i18n_printing(5).

SEE ALSO

  Commands: locale(1)

  Others: ascii(5), big5(5), Chinese(5), code_page(5), dechanyu(5), eucTW(5),
  GB18030(5), GBK(5), i18n_intro(5), i18n_printing(5), iconv_intro(5),
  l10n_intro(5), sbig5(5), telecode(5),	Unicode(5)