unixdev.net


Switch to SpeakEasy.net DSL

The Modular Manual Browser

Home Page
Manual: (OSF1-V5.1-alpha)
Page:
Section:
Apropos / Subsearch:
optional field



code_page(5)							 code_page(5)



NAME

  code_page, cp437, cp737, cp775, cp850, cp852,	cp855, cp857, cp860, cp861,
  cp862, cp863,	cp865, cp866, cp869, cp874, cp932, cp936, cp949, cp950,
  cp1250, cp1251, cp1252, cp1253, cp1254, cp1255, cp1256, cp1257, cp1258,
  dingbats, symbol - Coded character sets that are used	on Microsoft Windows
  and NT systems

DESCRIPTION

  Code pages are coded character sets that are used on Microsoft Windows,
  Windows 95, and NT systems. Just as there are	different UNIX codesets,
  there	are different PC code pages, each supporting a particular set of
  character encodings.

  A Tru64 UNIX system supplies one locale, en_US.cp850,	that directly sup-
  ports	a PC code-page format (MS-DOS Latin 1).	For all	other locales, data
  in code-page format is supported only	through	codeset	converters.  These
  converters can be run	directly by users or by	software or applications that
  exchange data	between	PC and Tru64 UNIX systems. Fonts and other kinds of
  character support are	available only for the native UNIX codeset to which a
  code page can	be converted. See the i18n_intro(5) reference page for intro-
  ductory information on locales and codesets. See the iconv_intro(5) refer-
  ence page for	an introduction	to codeset conversion and the name format and
  location of codeset converters.

  The following	table lists and	describes the code pages that have conversion
  support on a Tru64 UNIX system. An asterisk (*) follows the names of code
  pages	that include support for the Euro currency sign	(C=).

  ______________________________________________________________
  Code Page	       Description
  ______________________________________________________________
  cp437		       MS-DOS United States
  cp737		       Greek
  cp775		       Baltic languages	(1)
  cp850		       MS-DOS Multilingual (Latin-1)
  cp852		       MS-DOS Slavic (Latin-2)
  cp855		       IBM Cyrillic
  cp857		       IBM Turkish
  cp860		       MS-DOS Portuguese
  cp861		       MS-DOS Icelandic
  cp862		       Hebrew
  cp863		       MS-DOS Canadian French
  cp865		       MS-DOS Nordic languages
  cp866		       MS-DOS Russian
  cp869		       IBM Modern Greek
  cp874	*	       MS-DOS Thai
  cp932		       Japanese
  cp936		       Chinese (People's Republic of China)
  cp949		       Korean
  cp950		       Chinese (Hong Kong)

		       Windows Latin-2

  cp1250 *
		       Windows Cyrillic

  cp1251 *
		       Windows Latin-1

  cp1252 *
		       Windows Greek

  cp1253 *
		       Windows Turkish

  cp1254 *

		       Windows Hebrew

  cp1255 *
		       Windows Arabic

  cp1256 *
		       Windows Baltic (1)

  cp1257 *
		       Windows Vietnamese

  cp1258 *
  dingbats	       Microsoft dingbat characters
  symbol	       Microsoft miscellaneous symbol characters
  ______________________________________________________________

  (1) Baltic languages include Estonian, Latvian, and Lithuanian.

  (2) Latin-2 languages	include	Albanian, Croatian, Czech, Faeroese, Hun-
  garian, Polish, Romanian, Latin Serbian, Slovak, and Slovenian.

  (3) Cyrillic languages include Byelorussian, Bulgarian, and Russian.

  In all cases,	a code page can	be converted to	and from the UCS-2, UCS-4,
  and UTF-8 codesets. In addition, some	code pages can be converted directly
  to ISO codesets as shown in the following table, although some data loss
  may occur.

  _________________________________________
  Code Page   Can Be Converted Directly	to:
  _________________________________________
  cp437	      ISO8859-1
  cp737	      ISO8859-7
  cp775	      ISO8859-4
  cp850	      ISO8859-1
  cp852	      ISO8859-2
  cp855	      ISO8859-5
  cp857	      ISO8859-9
  cp860	      ISO8859-1
  cp861	      ISO8859-1
  cp862	      ISO8859-8
  cp863	      ISO8859-1
  cp865	      ISO8859-1
  cp866	      ISO8859-5
  cp869	      ISO8859-7
  cp874	      TACTIS
  cp1252      ISO8859-1, ISO8859-15
  _________________________________________

  See Unicode(5) for information about UCS-2, UCS-4, and UTF-8.	Reference
  pages	for UNIX implementations of the	ISO codesets have the name format
  iso8859-number(5).

  For Traditional Chinese and Japanese,	there are no codeset converters	whose
  names	include	the name of a code page	because	identical character encoding
  is provided in existing UNIX codesets. For Traditional Chinese, character
  encoding in PC code-page format (cp950) is identical to that in the Big-5
  (big5) codeset. For Japanese,	character encoding in PC code-page format
  (cp932) is identical to that in the Shift JIS	(SJIS) codeset.	Therefore,
  the codeset converters whose names include big5 and SJIS can be used to
  convert data in and out of PC	code-page format for the supported languages.







	    Caution for	Conversion of Korean and Simplified Chinese

       Conversion of text that starts out in code-page format (cp949) to the
       DEC Korean (deckorean) codeset may result in loss of data. All of the
       Tru64 UNIX codeset equivalents for cp949	support	all the	Hanja and
       miscellaneous characters	also supported by the code page. However,
       only the	UCS-2, UCS-4, and UTF-8	codesets support the complete set of
       Hangul characters supported by the cp949	code page.  The	deckorean
       codeset supports	only a subset of these Hangul characters. Therefore,
       if data is converted from cp949 format to UCS-2,	UCS-4, or UTF-8, no
       data is lost. However, if the data is then converted from UCS-2,	UCS-
       4, or UTF-8 to deckorean, the unsupported Hangul	characters will	be
       lost.

       The DEC Hanzi (dechanzi)	codeset	uses the same encoding format as the
       PC code page used for Simplified	Chinese	(cp936)	but does not support
       all the characters supported by the code	page.  Therefore, you can use
       converters with dechanzi	in the converter name to convert text to and
       from cp936 format, but the operation may	result in some loss of data.

SEE ALSO

  Commands: iconv(1)

  Functions: iconv(3), iconv_close(3), iconv_open(3)

  Others: i18n_intro(5), iconv_intro(5), iso8859-1(5), iso8859-2(5),
  iso8859-4(5),	iso8859-5(5), iso8859-7(5), iso8859-8(5), iso8859-15(5),
  Unicode(5)