unixdev.net


Switch to SpeakEasy.net DSL

The Modular Manual Browser

Home Page
Manual: (SunOS-5.10)
Page:
Section:
Apropos / Subsearch:
optional field

iconv_unicode(5)      Standards, Environments, and Macros     iconv_unicode(5)



NAME
       iconv_unicode - code set conversion tables for Unicode

DESCRIPTION
       The following code set conversions are supported:

                           CODE SET CONVERSIONS SUPPORTED
                           ------------------------------
         FROM Code Set                               TO Code Set
             Code              FROM          Target Code            TO
                               Filename                             Filename
                               Element                              Element

       ISO 8859-1 (Latin 1)    8859-1            UTF-8               UTF-8
       ISO 8859-2 (Latin 2)    8859-2            UTF-8               UTF-8
       ISO 8859-3 (Latin 3)    8859-3            UTF-8               UTF-8
       ISO 8859-4 (Latin 4)    8859-4            UTF-8               UTF-8
       ISO 8859-5 (Cyrillic)   8859-5            UTF-8               UTF-8
       ISO 8859-6 (Arabic)     8859-6            UTF-8               UTF-8
       ISO 8859-7 (Greek)      8859-7            UTF-8               UTF-8
       ISO 8859-8 (Hebrew)     8859-8            UTF-8               UTF-8
       ISO 8859-9 (Latin 5)    8859-9            UTF-8               UTF-8
       ISO 8859-10 (Latin 6)   8859-10           UTF-8               UTF-8
       Japanese EUC            eucJP             UTF-8               UTF-8
       Chinese/PRC EUC
       (GB 2312-1980)          gb2312            UTF-8               UTF-8
       ISO-2022                iso2022           UTF-8               UTF-8
       Korean EUC              ko_KR-euc         Korean UTF-8        ko_KR-UTF-8
       ISO-2022-KR             ko_KR-iso2022-7   Korean UTF-8        ko_KR_UTF-8
       Korean Johap
       (KS C 5601-1987)        ko_KR-johap       Korean UTF-8        ko_KR-UTF-8
       Korean Johap
       (KS C 5601-1992)        ko_KR-johap92     Korean UTF-8        ko_KR-UTF-8
       Korean UTF-8            ko_KR-UTF-8       Korean EUC          ko_KR-euc
       Korean UTF-8            ko_KR-UTF-8       Korean Johap        ko_KR-johap
                                                 (KS C 5601-1987)
       Korean UTF-8            ko_KR-UTF-8       Korean Johap        ko_KR-johap92
                                                 (KS C 5601-1992)
       KOI8-R (Cyrillic)       KOI8-R            UCS-2               UCS-2
       KOI8-R (Cyrillic)       KOI8-R            UTF-8               UTF-8
       PC Kanji (SJIS)         PCK               UTF-8               UTF-8
       PC Kanji (SJIS)         SJIS              UTF-8               UTF-8
       UCS-2                   UCS-2             KOI8-R (Cyrillic)   KOI8-R
       UCS-2                   UCS-2             UCS-4               UCS-4

                           CODE SET CONVERSIONS SUPPORTED
                           ------------------------------
         FROM Code Set                               TO Code Set
             Code              FROM          Target Code            TO
                               Filename                             Filename
                               Element                              Element

       UCS-2              UCS-2           UTF-7                   UTF-7
       UCS-2              UCS-2           UTF-8                   UTF-8
       UCS-4              UCS-4           UCS-2                   UCS-2
       UCS-4              UCS-4           UTF-16                  UTF-16
       UCS-4              UCS-4           UTF-7                   UTF-7
       UCS-4              UCS-4           UTF-8                   UTF-8
       UTF-16             UTF-16          UCS-4                   UCS-4
       UTF-16             UTF-16          UTF-8                   UTF-8
       UTF-7              UTF-7           UCS-2                   UCS-2
       UTF-7              UTF-7           UCS-4                   UCS-4
       UTF-7              UTF-7           UTF-8                   UTF-8
       UTF-8              UTF-8           ISO 8859-1 (Latin 1)    8859-1
       UTF-8              UTF-8           ISO 8859-2 (Latin 2)    8859-2
       UTF-8              UTF-8           ISO 8859-3 (Latin 3)    8859-3
       UTF-8              UTF-8           ISO 8859-4 (Latin 4)    8859-4
       UTF-8              UTF-8           ISO 8859-5 (Cyrillic)   8859-5
       UTF-8              UTF-8           ISO 8859-6 (Arabic)     8859-6
       UTF-8              UTF-8           ISO 8859-7 (Greek)      8859-7
       UTF-8              UTF-8           ISO 8859-8 (Hebrew)     8859-8
       UTF-8              UTF-8           ISO 8859-9 (Latin 5)    8859-9
       UTF-8              UTF-8           ISO 8859-10 (Latin 6)   8859-10
       UTF-8              UTF-8           Japanese EUC            eucJP
       UTF-8              UTF-8           Chinese/PRC EUC         gb2312
                                          (GB 2312-1980)
       UTF-8              UTF-8           ISO-2022                iso2022
       UTF-8              UTF-8           KOI8-R (Cyrillic)       KOI8-R
       UTF-8              UTF-8           PC Kanji (SJIS)         PCK
       UTF-8              UTF-8           PC Kanji (SJIS)         SJIS
       UTF-8              UTF-8           UCS-2                   UCS-2
       UTF-8              UTF-8           UCS-4                   UCS-4
       UTF-8              UTF-8           UTF-16                  UTF-16
       UTF-8              UTF-8           UTF-7                   UTF-7
       UTF-8              UTF-8           Chinese/PRC EUC         zh_CN.euc
                                          (GB 2312-1980)

                           CODE SET CONVERSIONS SUPPORTED
                           ------------------------------
         FROM Code Set                               TO Code Set
             Code              FROM          Target Code            TO
                               Filename                             Filename
                               Element                              Element

       UTF-8                 UTF-8             ISO 2022-CN           zh_CN.iso2022-7
       UTF-8                 UTF-8             Chinese/Taiwan Big5   zh_TW-big5
       UTF-8                 UTF-8             Chinese/Taiwan  EUC   zh_TW-euc
                                               (CNS 11643-1992)
       UTF-8                 UTF-8             ISO 2022-TW           zh_TW-iso2022-7
       Chinese/PRC EUC       zh_CN.euc         UTF-8                 UTF-8
       (GB 2312-1980)
       ISO 2022-CN           zh_CN.iso2022-7   UTF-8                 UTF-8
       Chinese/Taiwan Big5   zh_TW-big5        UTF-8                 UTF-8
       Chinese/Taiwan  EUC   zh_TW-euc         UTF-8                 UTF-8
       (CNS 11643-1992)
       ISO 2022-TW           zh_TW-iso2022-7   UTF-8                 UTF-8

EXAMPLES
       Example 1: The library module filename

       In  the conversion library, /usr/lib/iconv (see iconv(3C)), the library
       module filename is composed of two symbolic elements separated  by  the
       percent sign (%). The first symbol specifies the code set that is being
       converted; the second symbol specifies the target code,  that  is,  the
       code set to which the first one is being converted.

       In  the  conversion  table above, the first  symbol is termed the "FROM
       Filename Element". The second symbol, representing the target code set,
       is the "TO Filename Element".

       For example, the library module filename to convert from the Korean EUC
       code set to the Korean UTF-8 code set is

       ko_KR-euc%ko_KR-UTF-8

FILES
       /usr/lib/iconv/*.so             conversion modules



SEE ALSO
       iconv(1), iconv(3C), iconv(5)

       Chernov, A., Registration of a Cyrillic Character Set, RFC 1489, RELCOM
       Development Team, July 1993.

       Chon, K., H. Je Park, and U. Choi, Korean Character Encoding for Inter-
       net Messages, RFC 1557, Solvit Chosun Media, December 1993.

       Goldsmith, D., and M. Davis, UTF-7 - A Mail-Safe Transformation  Format
       of Unicode, RFC 1642, Taligent, Inc., July 1994.

       Lee,  F.,  HZ - A Data Format for Exchanging Files of Arbitrarily Mixed
       Chinese and ASCII characters, RFC  1843,  Stanford  University,  August
       1995.

       Murai, J., M. Crispin, and E. van der Poel, Japanese Character Encoding
       for Internet Messages, RFC 1468, Keio  University,  Panda  Programming,
       June 1993.

       Nussbacher, H., and Y. Bourvine, Hebrew Character Encoding for Internet
       Messages, RFC 1555, Israeli Inter-University, Hebrew University, Decem-
       ber 1993.

       Ohta,  M.,  Character Sets ISO-10646 and ISO-10646-J-1, RFC 1815, Tokyo
       Institute of Technology, July 1995.

       Ohta, M.,  and  K.  Handa,  ISO-2022-JP-2:  Multilingual  Extension  of
       ISO-2022-JP, RFC 1554, Tokyo Institute of Technology, December 1993.

       Reynolds,  J., and J. Postel, ASSIGNED NUMBERS, RFC 1700, University of
       Southern California/Information Sciences Institute, October 1994.

       Simonson, K., Character Mnemonics & Character Sets, RFC 1345,  Rationel
       Almen Planlaegning, June 1992.

       Spinellis,  D.,  Greek Character Encoding for Electronic Mail Messages,
       RFC 1947, SENA S.A., May 1996.

       The Unicode Consortium, The Unicode Standard, Version 2.0, Addison Wes-
       ley Developers Press, July 1996.

       Wei,  Y., Y. Zhang, J. Li, J. Ding, and Y. Jiang, ASCII Printable Char-
       acters-Based Chinese Character  Encoding  for  Internet  Messages,  RFC
       1842, AsiaInfo Services Inc., Harvard University, Rice University, Uni-
       versity of Maryland, August 1995.

       Yergeau, F., UTF-8, a transformation format of Unicode and  ISO  10646,
       RFC 2044, Alis Technologies, October 1996.

       Zhu,  H.,  D.  Hu,  Z.  Wang, T. Kao, W. Chang, and M. Crispin, Chinese
       Character Encoding for Internet Messages, RFC  1922,  Tsinghua  Univer-
       sity,  China Information Technology Standardization Technical Committee
       (CITS), Institute for Information Industry (III), University  of  Wash-
       ington, March 1996.

NOTES
       ISO  8859  character sets using Latin alphabetic characters are distin-
       guished as follows:

       ISO 8859-1 (Latin 1)

           For most West European languages, including:



           tab();  lw(1.833333i)   lw(1.833333i)   lw(1.833333i).    Albanian-
           FinnishItalian CatalanFrenchNorwegian DanishGermanPortuguese Dutch-
           GalicianSpanish EnglishIrishSwedish FaeroeseIcelandic



       ISO 8859-2 (Latin 2)

           For most Latin-written Slavic and Central European languages:



           tab(); lw(1.833333i) lw(1.833333i) lw(1.833333i).   CzechPolishSlo-
           vak GermanRumanianSlovene HungarianCroatian



       ISO 8859-3 (Latin 3)

           Popularly used for Esperanto, Galician, Maltese, and Turkish.



       ISO 8859-4 (Latin 4)

           Introduces  letters for Estonian, Latvian, and Lithuanian. It is an
           incomplete predecessor of ISO 8859-10 (Latin 6).



       ISO 8859-9 (Latin 5)

           Replaces the rarely needed Icelandic letters in ISO  8859-1  (Latin
           1) with the Turkish ones.



       ISO 8859-10 (Latin 6)

           Adds  the  last Inuit (Greenlandic) and Sami (Lappish) letters that
           were not included in ISO 8859-4 (Latin 4) to complete  coverage  of
           the Nordic area.





SunOS 5.10                        18 Apr 1997                 iconv_unicode(5)