unixdev.net


Switch to SpeakEasy.net DSL

The Modular Manual Browser

Home Page
Manual: (SunOS-5.9)
Page:
Section:
Apropos / Subsearch:
optional field



Standards, Environments, and Macros              iconv_unicode(5)



NAME
     iconv_unicode - code set conversion tables for Unicode

DESCRIPTION
     The following code set conversions are supported:

                         CODE SET CONVERSIONS SUPPORTED
                         ------------------------------
       FROM Code Set                               TO Code Set
           Code              FROM          Target Code            TO
                             Filename                             Filename
                             Element                              Element

     ISO 8859-1 (Latin 1)    8859-1            UTF-8               UTF-8
     ISO 8859-2 (Latin 2)    8859-2            UTF-8               UTF-8
     ISO 8859-3 (Latin 3)    8859-3            UTF-8               UTF-8
     ISO 8859-4 (Latin 4)    8859-4            UTF-8               UTF-8
     ISO 8859-5 (Cyrillic)   8859-5            UTF-8               UTF-8
     ISO 8859-6 (Arabic)     8859-6            UTF-8               UTF-8
     ISO 8859-7 (Greek)      8859-7            UTF-8               UTF-8
     ISO 8859-8 (Hebrew)     8859-8            UTF-8               UTF-8
     ISO 8859-9 (Latin 5)    8859-9            UTF-8               UTF-8
     ISO 8859-10 (Latin 6)   8859-10           UTF-8               UTF-8
     Japanese EUC            eucJP             UTF-8               UTF-8
     Chinese/PRC EUC
     (GB 2312-1980)          gb2312            UTF-8               UTF-8
     ISO-2022                iso2022           UTF-8               UTF-8
     Korean EUC              ko_KR-euc         Korean UTF-8        ko_KR-UTF-8
     ISO-2022-KR             ko_KR-iso2022-7   Korean UTF-8        ko_KR_UTF-8
     Korean Johap
     (KS C 5601-1987)        ko_KR-johap       Korean UTF-8        ko_KR-UTF-8
     Korean Johap
     (KS C 5601-1992)        ko_KR-johap92     Korean UTF-8        ko_KR-UTF-8
     Korean UTF-8            ko_KR-UTF-8       Korean EUC          ko_KR-euc
     Korean UTF-8            ko_KR-UTF-8       Korean Johap        ko_KR-johap
                                               (KS C 5601-1987)
     Korean UTF-8            ko_KR-UTF-8       Korean Johap        ko_KR-johap92
                                               (KS C 5601-1992)
     KOI8-R (Cyrillic)       KOI8-R            UCS-2               UCS-2
     KOI8-R (Cyrillic)       KOI8-R            UTF-8               UTF-8
     PC Kanji (SJIS)         PCK               UTF-8               UTF-8
     PC Kanji (SJIS)         SJIS              UTF-8               UTF-8
     UCS-2                   UCS-2             KOI8-R (Cyrillic)   KOI8-R
     UCS-2                   UCS-2             UCS-4               UCS-4

                         CODE SET CONVERSIONS SUPPORTED
                         ------------------------------
       FROM Code Set                               TO Code Set
           Code              FROM          Target Code            TO
                             Filename                             Filename
                             Element                              Element




SunOS 5.9           Last change: 18 Apr 1997                    1






Standards, Environments, and Macros              iconv_unicode(5)



     UCS-2              UCS-2           UTF-7                   UTF-7
     UCS-2              UCS-2           UTF-8                   UTF-8
     UCS-4              UCS-4           UCS-2                   UCS-2
     UCS-4              UCS-4           UTF-16                  UTF-16
     UCS-4              UCS-4           UTF-7                   UTF-7
     UCS-4              UCS-4           UTF-8                   UTF-8
     UTF-16             UTF-16          UCS-4                   UCS-4
     UTF-16             UTF-16          UTF-8                   UTF-8
     UTF-7              UTF-7           UCS-2                   UCS-2
     UTF-7              UTF-7           UCS-4                   UCS-4
     UTF-7              UTF-7           UTF-8                   UTF-8
     UTF-8              UTF-8           ISO 8859-1 (Latin 1)    8859-1
     UTF-8              UTF-8           ISO 8859-2 (Latin 2)    8859-2
     UTF-8              UTF-8           ISO 8859-3 (Latin 3)    8859-3
     UTF-8              UTF-8           ISO 8859-4 (Latin 4)    8859-4
     UTF-8              UTF-8           ISO 8859-5 (Cyrillic)   8859-5
     UTF-8              UTF-8           ISO 8859-6 (Arabic)     8859-6
     UTF-8              UTF-8           ISO 8859-7 (Greek)      8859-7
     UTF-8              UTF-8           ISO 8859-8 (Hebrew)     8859-8
     UTF-8              UTF-8           ISO 8859-9 (Latin 5)    8859-9
     UTF-8              UTF-8           ISO 8859-10 (Latin 6)   8859-10
     UTF-8              UTF-8           Japanese EUC            eucJP
     UTF-8              UTF-8           Chinese/PRC EUC         gb2312
                                        (GB 2312-1980)
     UTF-8              UTF-8           ISO-2022                iso2022
     UTF-8              UTF-8           KOI8-R (Cyrillic)       KOI8-R
     UTF-8              UTF-8           PC Kanji (SJIS)         PCK
     UTF-8              UTF-8           PC Kanji (SJIS)         SJIS
     UTF-8              UTF-8           UCS-2                   UCS-2
     UTF-8              UTF-8           UCS-4                   UCS-4
     UTF-8              UTF-8           UTF-16                  UTF-16
     UTF-8              UTF-8           UTF-7                   UTF-7
     UTF-8              UTF-8           Chinese/PRC EUC         zh_CN.euc
                                        (GB 2312-1980)

                         CODE SET CONVERSIONS SUPPORTED
                         ------------------------------
       FROM Code Set                               TO Code Set
           Code              FROM          Target Code            TO
                             Filename                             Filename
                             Element                              Element

     UTF-8                 UTF-8             ISO 2022-CN           zh_CN.iso2022-7
     UTF-8                 UTF-8             Chinese/Taiwan Big5   zh_TW-big5
     UTF-8                 UTF-8             Chinese/Taiwan  EUC   zh_TW-euc
                                             (CNS 11643-1992)
     UTF-8                 UTF-8             ISO 2022-TW           zh_TW-iso2022-7
     Chinese/PRC EUC       zh_CN.euc         UTF-8                 UTF-8
     (GB 2312-1980)
     ISO 2022-CN           zh_CN.iso2022-7   UTF-8                 UTF-8
     Chinese/Taiwan Big5   zh_TW-big5        UTF-8                 UTF-8
     Chinese/Taiwan  EUC   zh_TW-euc         UTF-8                 UTF-8



SunOS 5.9           Last change: 18 Apr 1997                    2






Standards, Environments, and Macros              iconv_unicode(5)



     (CNS 11643-1992)
     ISO 2022-TW           zh_TW-iso2022-7   UTF-8                 UTF-8


EXAMPLES
     Example 1: The library module filename

     In the conversion library, /usr/lib/iconv  (see  iconv(3C)),
     the library module filename is composed of two symbolic ele-
     ments separated by the percent sign (%).  The  first  symbol
     specifies  the  code set that is being converted; the second
     symbol specifies the target code, that is, the code  set  to
     which the first one is being converted.

     In the conversion table above, the first  symbol  is  termed
     the "FROM Filename Element". The second symbol, representing
     the target code set, is the "TO Filename Element".

     For example, the library module filename to convert from the
     Korean EUC code set to the Korean UTF-8 code set is

     ko_KR-euc%ko_KR-UTF-8

FILES
     /usr/lib/iconv/*.so
           conversion modules

SEE ALSO
     iconv(1), iconv(3C), iconv(5)

     Chernov, A., Registration of a Cyrillic Character  Set,  RFC
     1489, RELCOM Development Team, July 1993.

     Chon, K., H. Je Park, and U. Choi, Korean Character Encoding
     for  Internet  Messages,  RFC  1557,  Solvit  Chosun  Media,
     December 1993.

     Goldsmith, D., and M. Davis, UTF-7 - A Mail-Safe Transforma-
     tion Format of Unicode, RFC 1642, Taligent, Inc., July 1994.

     Lee, F., HZ - A Data Format for Exchanging  Files  of  Arbi-
     trarily  Mixed Chinese and ASCII characters, RFC 1843, Stan-
     ford University, August 1995.

     Murai, J., M. Crispin, and E. van der Poel, Japanese Charac-
     ter  Encoding  for Internet Messages, RFC 1468, Keio Univer-
     sity, Panda Programming, June 1993.

     Nussbacher, H., and Y. Bourvine, Hebrew  Character  Encoding
     for  Internet  Messages, RFC 1555, Israeli Inter-University,
     Hebrew University, December 1993.




SunOS 5.9           Last change: 18 Apr 1997                    3






Standards, Environments, and Macros              iconv_unicode(5)



     Ohta, M., Character Sets ISO-10646  and  ISO-10646-J-1,  RFC
     1815, Tokyo Institute of Technology, July 1995.

     Ohta, M., and K. Handa, ISO-2022-JP-2:  Multilingual  Exten-
     sion  of  ISO-2022-JP, RFC 1554, Tokyo Institute of Technol-
     ogy, December 1993.

     Reynolds, J., and J. Postel,  ASSIGNED  NUMBERS,  RFC  1700,
     University   of   Southern  California/Information  Sciences
     Institute, October 1994.

     Simonson, K., Character  Mnemonics  &  Character  Sets,  RFC
     1345, Rationel Almen Planlaegning, June 1992.

     Spinellis, D., Greek Character Encoding for Electronic  Mail
     Messages, RFC 1947, SENA S.A., May 1996.

     The Unicode Consortium, The Unicode Standard,  Version  2.0,
     Addison Wesley Developers Press, July 1996.

     Wei, Y., Y. Zhang, J. Li,  J.  Ding,  and  Y.  Jiang,  ASCII
     Printable  Characters-Based  Chinese  Character Encoding for
     Internet Messages, RFC 1842, AsiaInfo Services Inc., Harvard
     University,  Rice University, University of Maryland, August
     1995.

     Yergeau, F., UTF-8, a transformation format of  Unicode  and
     ISO 10646, RFC 2044, Alis Technologies, October 1996.

     Zhu, H., D. Hu, Z. Wang, T. Kao, W. Chang, and  M.  Crispin,
     Chinese  Character Encoding for Internet Messages, RFC 1922,
     Tsinghua University, China Information Technology Standardi-
     zation Technical Committee (CITS), Institute for Information
     Industry (III), University of Washington, March 1996.

NOTES
     ISO 8859 character sets using  Latin  alphabetic  characters
     are distinguished as follows:

     ISO 8859-1 (Latin 1)
           For most West European languages, including:



           Albanian             Finnish               Italian
           Catalan              French                Norwegian
           Danish               German                Portuguese
           Dutch                Galician              Spanish
           English              Irish                 Swedish
           Faeroese             Icelandic





SunOS 5.9           Last change: 18 Apr 1997                    4






Standards, Environments, and Macros              iconv_unicode(5)



     ISO 8859-2 (Latin 2)
           For most Latin-written  Slavic  and  Central  European
           languages:


           Czech                Polish                Slovak
           German               Rumanian              Slovene
           Hungarian            Croatian


     ISO 8859-3 (Latin 3)
           Popularly used for Esperanto, Galician,  Maltese,  and
           Turkish.

     ISO 8859-4 (Latin 4)
           Introduces  letters   for   Estonian,   Latvian,   and
           Lithuanian.  It  is  an  incomplete predecessor of ISO
           8859-10 (Latin 6).

     ISO 8859-9 (Latin 5)
           Replaces the rarely needed Icelandic  letters  in  ISO
           8859-1 (Latin 1) with the Turkish ones.

     ISO 8859-10 (Latin 6)
           Adds the last Inuit (Greenlandic) and  Sami  (Lappish)
           letters that were not included in ISO 8859-4 (Latin 4)
           to complete coverage of the Nordic area.




























SunOS 5.9           Last change: 18 Apr 1997                    5