Switch to SpeakEasy.net DSL

The Modular Manual Browser

Home Page
Manual: (OSF1-V5.1-alpha)
Apropos / Subsearch:
optional field

GB18030(5)							   GB18030(5)


  GB18030, gb18030 - A Chinese character set that extends GBK by means of
  4-byte code points


  The GB18030-2000 character set, defined by the Chinese national standard
  organization,	is an extension	of the GBK character set, which	itself is an
  extension to the GB2312-80 character set. (See the GBK(5) reference page.)

  GB18030 incorporates GBK support for all the Hanzi characters	specified by
  the ISO 10646-1:1993 standard	that are not included in GB2312-80.  In	addi-
  tion,	GB18030	covers essentially the same set	of characters covered by the
  Unicode Version 3.0 and ISO/IEC 10646-2000 standards.

  GB18030 Code Space and Code Points

  The GB18030 character	set has	1-byte,	2-byte,	and 4-byte encoding with the
  following structure:

  Number of Bytes   Code Space			 Total Code Points
  1-byte	    0x00 to 0x7F		 128
  2-byte	    0x81 to 0xFE		 23940
		    0x40 to 0xFE (except 0x7F)
  4-byte	    0x81 to 0xFE		 1587600
		    0x30 to 0x39
		    0x81 to 0xFE
		    0x30 to 0x39

  The GB18030 1-byte code provides support for ASCII. The 2-byte code pro-
  vides	support	for all	the CJK	characters (Chinese, Japanese, and Korean)
  defined in the Unicode 2.1 standard. The 4-byte code provides	support	for
  the Unicode Version 3.0 additions to Version 2.1. The	4-byte code also
  leaves a large number	of unassigned codepoints that are available for
  future use.

  The GB18030 character	set maps the invalid Unicode codepoints	U+FFFE and
  U+FFFF to 4-byte codes. Because these	two characters are invalid in UCS,
  this mapping can cause problems with round-trip character conversions.

  The GB18030 character	set does no mapping from 4-byte	code to	the UCS	sur-
  rogate area (U+D800 through U+DFFF).

  Codeset Converters for GB18030

  The following	codeset	converter pairs	are available for converting Simpli-
  fied Chinese characters between GB18030 and UCS formats. Refer to
  Unicode(5) for more information about	the UCS-2, UCS-4, and UTF-8 encoding
  formats. Refer to iconv_intro(5) for an introduction to codeset conversion.

    +  UCS-2_GB18030, GB18030_UCS-2

       Converting from and to UCS-2 format

    +  UCS-4_GB18030, GB18030_UCS-4

       Converting from and to UCS-4 format

    +  UTF-8_GB18030, GB18030_UTF-8

       Converting from and to UTF-8 format

  Fonts	for GB18030

  The operating	system provides	the following Simplified Chinese TrueType
  fonts	for GB18030:





  These	fonts can be used for printing with Chinese text printers. The
  operating system uses	Unicode	fonts and the SongTi font style	as the
  default screen font for the GB18030 codeset.


  Commands: locale(1)

  Others: ascii(5), big5(5), Chinese(5), dechanyu(5), dechanzi(5), eucTW(5),
  GBK(5), i18n_intro(5), i18n_printing(5), l10n_intro(5), sbig5(5),