unixdev.net


Switch to SpeakEasy.net DSL

The Modular Manual Browser

Home Page
Manual: (OSF1-V5.1-alpha)
Page:
Section:
Apropos / Subsearch:
optional field



big5(5)								      big5(5)



NAME

  big5 - A character encoding system (codeset) for Traditional Chinese

DESCRIPTION

  The big5 codeset is one of several codesets that support the Traditional
  Chinese language.  This codeset includes the following character sets:

    +  ASCII

    +  Big-5

  The big5 codeset uses	a combination of single-byte data and two-byte data
  to represent ASCII characters, symbols, and Chinese ideographic characters.

  ASCII	Characters


  All ASCII characters are represented in the form of single-byte, 7-bit data
  in the big5 codeset; that is,	the most significant bit (MSB) of a byte that
  represents an	ASCII character	is always set off.  For	more information, see
  ascii(5).

  Big-5	Character Groups


  The Big-5 character set defines the following	character groups:

    +  Special symbols (408)

    +  Level 1 characters (5401)

    +  Level 2 characters (7652)

    +  Level 1 user-defined space (785)

    +  Level 2 user-defined space (2983)

    +  Level 3 user-defined space (2041)

  Code Values for Big-5	Characters


  Each Big-5 character is represented by a two-byte code that compiles
  according to the Big-5 standard. The MSB of the first	byte is	always set on
  while	that of	the second byte	can be on or off. Code ranges for characters
  in the different character groups are	as follows:

    +  Special symbols:	A140 to	A3BF

    +  Level 1 characters: A440	to C67E

    +  Level 2 characters: C940	to F9D5

    +  Level 1 user-defined space: FA40	to FEFE

    +  Level 2 user-defined space: 8E40	to A0FE

    +  Level 3 user-defined space: 8140	to 8DFE

       In this space, the valid	code range for the first byte is 81 to FE,
       while that for the second byte is 40 to 7E and A1 to FE.

  Codeset Conversion


  The following	codeset	converter pairs	are available for converting Tradi-
  tional Chinese characters between big5 and other encoding formats.  Refer
  to iconv_intro(5) for	an introduction	to codeset conversion. For more
  information about the	other codeset for which	big5 is	the input or output,
  see the reference page specified in the list item.

    +  dechanyu_big5, big5_dechanyu

       Converting from and to DEC Hanyu: dechanyu(5)

    +  dechanzi_big5, big5_dechanzi

       Converting from and to DEC Hanzi: dechanzi(5)

    +  eucTW_big5, big5_eucTW

       Converting from and to Taiwanese	Extended UNIX Code: eucTW(5)

    +  sbig5_big5, big5_sbig5

       Converting from and to Shift Big-5: sbig5(5)

    +  telecode_big5, big5_telecode

       Converting from and to Telecode:	telecode(5)

    +  UCS-2_big5, big5_UCS-2

       Converting from and to UCS-2: Unicode(5)

    +  UCS-4_big5, big5_UCS-4

       Converting from and to UCS-4: Unicode(5)

    +  UTF-8_big5, big5_UTF-8

       Converting from and to UTF-8: Unicode(5)

				     Note

       The big5	encoding format	is identical to	the encoding format used in
       PC code pages that support Traditional Chinese. Therefore, you can use
       codeset converters that convert between big5 and	UCS-2, UCS-4, or
       UTF-8 to	convert	Traditional Chinese data between PC code-page and
       Unicode encoding	formats. Refer to code_page(5) for a discussion	of
       how the operating system	supports PC code pages.







  Fonts	for Big-5 Characters


  The operating	system supports	Big-5 code by internally converting charac-
  ters to DEC Hanyu. Therefore,	DEC Hanyu fonts	are used for Big-5
  characters.  Both display and	printer	fonts are provided for DEC Hanyu and
  these	are listed in the dechanyu(5) reference	page.

  For general information about	printer	support	for and	codeset	conversion of
  Asian	text, refer to i18n_printing(5).

SEE ALSO

  Commands: locale(1)

  Others: ascii(5), Chinese(5),	code_page(5), dechanyu(5), dechanzi(5),
  eucTW(5), GB18030(5),	GBK(5),	i18n_intro(5), i18n_printing(5),
  iconv_intro(5), l10n_intro(5), sbig5(5), telecode(5),	Unicode(5)