unixdev.net


Switch to SpeakEasy.net DSL

The Modular Manual Browser

Home Page
Manual: (OSF1-V5.1-alpha)
Page:
Section:
Apropos / Subsearch:
optional field



iconv_intro(5)						       iconv_intro(5)



NAME

  iconv_intro, iconv - Introduction to codeset conversion

DESCRIPTION

  Conversion of	character encoding from	one coded character set	(codeset) to
  another is an	operation that often has to be performed by the	operating
  system and some applications.	For example, the man command supports codeset
  conversion to	allow one set of reference page	files to meet the needs	of
  locales that support the same	language and territory but different codesets
  (see man(1)).

  The following	commands and library interfaces	give users and application
  developers direct access to codeset conversion operations:

    +  The iconv command converts characters in	a data file from one codeset
       to another (see iconv(1)).

    +  The iconv(), iconv_open(), and iconv_close() functions convert a
       string of characters from one codeset to	another	(see iconv(3),
       iconv_open(3), and iconv_close(3)).  The	iconv command uses these
       interfaces to convert characters.

  There	are two	types of codeset converters: algorithmic and table. Algo-
  rithmic converters, which reside in the /usr/lib/nls/loc/iconv directory,
  are shared libraries with a predefined entry point for invocation by func-
  tions	in the libiconv.so library.  Algorithmic converters are	needed for
  the conversion of multibyte codesets,	in part	because	table converters can-
  not handle the required number of character values and also because some of
  these	codesets require complex handling (see NOTES). Algorithmic converters
  are supplied as part of the operating	system product;	the internal inter-
  faces	that they require are not published for	external use.

  Table	converters, which reside in the	/usr/lib/nls/loc/iconvTable direc-
  tory,	can be created by using	the genxlt command (see	genxlt(1)). These
  converters can support single-byte codesets and up to	256 encoded character
  values.

  Names	of codeset converters are in the following form:

  from-codeset_to-codeset

  For example, the following converter converts	values from Super DEC Kanji
  to Japanese Extended UNIX Code:

  sdeckanji_eucJP

  The codeset converters produce an invalid character error in response	to
  characters that cannot be converted from the source codeset to the destina-
  tion codeset.	This error is always produced for character codes that are
  invalid in the source	codeset. However, if the error results from charac-
  ters that are	valid in the source codeset but	have no	counterparts in	the
  destination codeset, you can eliminate the error by defining the
  ICONV_DEFSTR environment variable to specify a substitute output string.
  See the ENVIRONMENT VARIABLES	section	for more information about using the
  ICONV_DEFSTR variable.


  It is	possible to convert data directly between two codesets or by way of
  an intermediate codeset, such	as UCS-2, UCS-4, or UTF-8. For conversion of
  Chinese characters, be aware that the	results	of converting a	Traditional
  Chinese codeset directly to a	Simplified Chinese codeset may not be the
  same as the results of converting Traditional	Chinese	first to UCS-2,	UCS-
  4, or	UTF-8 and then to Simplified Chinese.

ENVIRONMENT VARIABLES

  Some codeset converters require more complex algorithms than can be pro-
  vided	through	tables.	The following environment variables provide control
  over conversion behavior for different kinds of codeset converters:

  ICONV_ACTION
      Controls the behavior for	the many-to-one	value conversions for conver-
      sion of Traditional Chinese (except for Traditional Chinese encoded in
      Telecode)	to Simplified Chinese. The valid settings for this environ-
      ment variable are	as follows:

      batch
	  Specifies that the preferred mapping value (the first	one in the
	  one-to-many mapping list) is always taken. The batch setting is the
	  ICONV_ACTION default.

      conv_all
	  Specifies that all the possible values are printed to	the standard
	  output, enclosed by braces ({	}), so that the	user can later manu-
	  ally edit the	converted file and select the one to use.

      conv_all_nosym
	  Specifies that all the possible values are printed to	the standard
	  output except	for punctuation	symbols, for which only	the preferred
	  mapping value	is printed. As is true for conv-all, the
	  conv_all_nosym setting prints	value choices enclosed by braces so
	  that the converted file can later be edited.

  ICONV_BYTEORDER
      Sets byte	ordering for UCS-2 or UCS-4 converters only. Valid values are
      little-endian (the default) or big-endian. Setting this environment
      variable may be necessary	when producing UCS-2 or	UCS-4 output that
      will be processed	by codeset converters on platforms other than Tru64
      UNIX.

  ICONV_DEFSTR[_from-codeset_to-codeset]
      Defines the default string to be substituted in output for valid input
      characters that cannot be	converted from the source codeset to the des-
      tination codeset.	The variable value can be an arbitrary string or a
      code number. If the value	is a code number (for example, 10, 07, 0x10,
      or, for Unicode converters, U+1234), the corresponding character in the
      output codeset (to-codeset) is printed.

      For a given type of codeset conversion, a	matching ICONV_DEFSTR_from-
      codeset_to-codeset variable has precedence over the ICONV_DEFSTR vari-
      able without the from-codeset_to-codeset suffix.	When defining the
      variable with the	suffix,	replace	from-codeset_to-codeset	with the name
      of the codeset converter to which	the variable applies. The
      ICONV_DEFSTR variable (defined without the  suffix) is used by a con-
      verter when no ICONV_DEFSTR_from-codeset_to-codeset variable has been
      defined specifically for the type	of conversion being done.

      If these variables are not defined or are	set to the null	string,	the
      characters that cannot be	converted are skipped and have no representa-
      tion in converted	output.

      The following converter-specific restrictions apply to ICONV_DEFSTR*
      variables:

	+  ICONV_DEFSTR* environment variables do not work for converters
	   that	convert	between	Japanese codesets or between Korean codesets.

	+  For converters that handle UCS-2, UCS-4 or UTF-8 format, the	only
	   valid variable value	is a code number (such as U+1234 or 0x10) or
	   a string whose value	is a single ASCII character (such as ?). For
	   these converters, any string	value other than a single ASCII	char-
	   acter is ignored and	any characters that cannot be converted	have
	   no representation in	output.

	+  For converters that handle output in	UCS-2, UCS-4 or	UTF-8 format,
	   characters that cannot be converted and for which no	valid
	   ICONV_DEFSTR* value has been	defined	produce	an error condition
	   that	aborts the conversion process.

  ICONV_NOBOM
      Disables generation of the byte-order mark at the	beginning of UCS-2 or
      UCS-4 output. A valid setting is any value other than a null string.
      By default, or if	this variable is set to	a null string, the byte-order
      mark is generated	at the beginning of UCS-2 or UCS-4 output.

      Codeset converters that process UCS-2 or UCS-4 data on platforms other
      than Tru64 UNIX usually require the byte-order mark. Therefore, the
      current default behavior of Tru64	UNIX codeset converters	produces out-
      put that is more likely to be supported as input to codeset converters
      on other platforms.  Use the ICONV_NOBOM variable	only if	you need
      backward compatibility with output produced by codeset converters	that
      were included in versions	of Tru64 UNIX prior to Tru64 UNIX Version
      4.0D.

  ICONV_PHRCONV
      Activates	phrase conversion for converters that convert from a Tradi-
      tional Chinese codeset (except for Traditional Chinese encoded in
      Telecode)	to a Simplified	Chinese	codeset	or the reverse.	When phrase
      conversion is activated, a whole phrase in Traditional Chinese is	con-
      verted to	a different phrase in Simplified Chinese or the	reverse.

      If ICONV_PHRCONV is set to mark, the converted phrases are be bracketed
      by [ and ] to highlight the conversion result for	visual checking.

      The phrase conversion databases in the /usr/share/phrdb directory	are
      normal text files	with the same file names as those of the algorithmic
      converters in /usr/lib/nls/loc/iconv/*.  These phrase conversion data-
      bases contain entries for	phrase conversion pairs.

FILES

  /usr/lib/nls/loc/iconv/*
      Algorithmic converters

  /usr/lib/nls/loc/iconvTable/*
      Table converters

  /usr/share/phrdb/*
      Phrase conversion	databases





SEE ALSO

  Commands: genxlt(1), iconv(1), phrase(1)


  Functions: iconv(3), iconv_close(3), iconv_open(3)

  Others: i18n_intro(5), l10n_intro(5)