StarQuest Technical Documents

StarSQL Character Conversion and National Language Support

Last Update: 08 March 2011
Product: StarSQL for Linux, UNIX, and Windows (ODBC driver)
Version: v5.3 and later
Article ID: SQV00SQ015

Abstract

The document describes the format of the file that StarSQL for Linux, UNIX, and Windows (the ODBC driver) uses to look up CCSID values and determine which character conversion routine to use for converting character data between disparate systems, and lists the CCSIDs that are currently supported by StarSQL.

Solution

StarSQL uses a data-driven architecture to support character conversions, which allows support for specific languages and character encoding schemes to be added without modifying the StarSQL source code. You can specify particular Coded Character Set Identifiers (CCSIDs) to use by setting the TypDefOver settings of the data source definition that StarSQL uses to connect to the host. Refer to the documentation for your version of StarSQL for details about customizing the StarSQL driver data source.

The Conversion Table

StarSQL performs inbound data conversion from the host system based upon the conversions that are defined in the ccsid.csv table that is installed with StarSQL. The ccsid.csv table is platform-specific, and is installed to the \Programs\StarSQL directory of a Windows-based computer or to the $STARSQL/etc/conf subdirectory of a Linux- or UNIX-based computer. The format of the CCSID.CSV table is as follows:

Column 1 Column 2 Column 3 Column 4 Column 5 Column 6
CCSID 'A' for ASCII,
'E' for EBCDIC,
'U' for Unicode
'S' (for SBCS)
'M' (for MBCS)
'G' (for Graphic)
mapping for the iconv codeset to use for the CCSID client locale codeset name if different from iconv codeset an optional, single-byte CCSID to associate with a multi-byte CCSID

Contact StarQuest Customer Support to request that additional languages or codesets be added to the ccsid.csv table and conversion routines.

StarSQL uses the ccsid.csv table and conversion routines to convert all inbound data to characters defined by the local code page of the client computer, and to convert single-byte outbound data as necessary to match the CCSID expected by DB2.

StarSQL and DB2 communicate which CCSIDs that they plan to use in the TYPDEFOVR parameters that are sent at connect time. If you encounter a problem when connecting or sending data, review the error messages for information about an unsupported CCSID or an invalid CCSID in the TYPDEFOVR setting.

Supported CCSIDs

StarSQL supports converting character data between a wide range of CCSID-to-CCSID pairs and CCSID-to-code-page pairs. It supports the Group 1, Group 1A, Group 2, and Unicode character sets as defined by CDRA (IBM Character Definition Representation Architecture)..

  • Group 1 covers the Roman Alphabet Number 1, which includes Australia, Hong Kong, New Zealand, North and South America, and Western Europe.

  • Group 1A covers multilingual scripts Cyrillic, Hebrew, Greek, and Turkish. The Latin 2 character set associated with Central Europe is supported in this group.

  • Group 2 covers double-byte coding for Japan, Korea, the People’s Republic of China, the Republic of China, and Thailand.

The following table lists the CCSIDs that StarSQL currently supports. Some CCSIDs may not be available on all platforms.

CCSID Description
037 Europe EBCDIC (Australia, Brazil, Canada, Netherlands, New Zealand, Portugal)
256 Netherlands EBCDIC
273 Austria, Germany EBCDIC
277 Denmark, Norway EBCDIC
278 Finland, Sweden EBCDIC
280 Italian EBCDIC
284 Spanish EBCDIC
285 United Kingdom EBCDIC
290 Japanese EBCDIC (SBCS)
297 French EBCDIC
300 Japanese EBCDIC (DBCS)
301 Japanese PC-Data (DBCS including 1880 UDC)
367 US ANSI X3.4 ASCI
420 Arabic EBCDIC
423 Greek EBCDIC
424 Hebrew EBCDIC
437 USA PC-Data
500 International EBCDIC
813 ISO 8859-7 ASCII
819 ISO 8859-1 ASCII (Latin Alphabet)
833 Korean EBCDIC
834 Korean EBCDIC (DBCS)
835 Traditional Chinese EBCDIC (DBCS)
836 Simplified Chinese EBCDIC (extended SBCS)
837 Simplified Chinese EBCDIC (MBCS)
838 Thailand EBCDIC
850 PC-Data MLP 222 Latin-1
856 Hebrew PC-Data
866 Cyrillic PC-Data
870 Latin-2 EBCDIC
871 Iceland EBCDIC
874 Thai PC-Data
875 Greek EBCDIC
878 Kois-Russian Cyrillic
880 Cyrillic EBCDIC
895 Japanese (7-bit Latin)
897 Japanese PC-Data (SBCS)
905 Turkey EBCDIC
912 ISO 8859-2 ASCII
913 ISO 8859-3 ASCII
914 ISO 8859-4 ASCII
915 ISO 8859-5 ASCII
916 ISO 8859-8 ASCII
918 Urdu EBCDIC
920 ISO 8859-9 ASCII
921 ISO 8859-13 ASCII
923 ISO 8859-15 ASCII
924 Latin 9 EBCDIC
930 Japan EBCDIC (MBCS)
932 Japan PC-Data (MBCS)
933 Korea EBCDIC (MBCS)
935 Simplified Chinese EBCDIC (MBCS)
936 Simplified Chinese PC-Data (SBCS)
937 Traditional Chinese EBCDIC (SBCS)
938 Traditional Chinese PC-Data (MBCS)
939 Japan EBCDIC (MBCS)
943 Japan PC-Data (MBCS) for Open environment
949 Korea PC-Data (MBCS)
950 Traditional Chinese PC-Data (mixed for IBM BIG-5)
951 IBM KS PC-Data (MBCS)
954 Japanese EUC
964 Traditional Chinese EUC
970 Korean EUC
1025 Cyrillic EBCDIC
1026 Turkey Latin-5 EBCDIC
1027 Japan Latin EBCDIC
1041 Japan PC-Data
1046 Arabic PC-Data
1047 Latin Open System EBCDIC
1051 HP emulation
1088 Korea KS PC-Data
1089 Arabic ISO 8859-6
1097 Farsi EBCDIC
1112 Baltic EBCDIC
1122 Estonia EBCDIC
1123 Ukraine EBCDIC
1130 Vietnamese EBCDIC
1132 Lao EBCDIC
1140 COM Europe ECECP
1141 Austria, Germany ECECP
1142 Denmark, Norway ECECP
1143 Finland, Sweden ECECP
1144 Italian ECECP
1145 Spanish ECECP
1146 United Kingdom ECECP
1147 French ECECP
1148 International ECECP
1149 Iceland ECECP
1153 Latin-2 EBCDIC
1154 Cyrillic EBCDIC
1155 Turkey Latin-5 with euro
1156 Baltic, Multilingual with euro
1157 Estonia EBCDIC
1160 Thai EBCDIC (SBCS)
1161 Thai PC-Data (SBCS)
1167 Kois Russian
1168 Kois Ukrainian
1200 UTF-16 Big Endian with IBM PUA
1208 UTF-8 with IBM PUA
1250 MS-Windows Latin-2
1251 MS-Windows Cyrillic
1252 MS-Windows Latin-1
1253 MS-Windows Greek
1254 MS-Windows Turkey
1255 MS-Windows Hebrew
1256 MS-Windows Arabic
1257 MS-Windows Baltic
1258 MS-Windows Vietnamese
1363 MS-Windows Korean
1364 Korean mixed Extended
1375 Big-5 extension for HKSCS (MBCS)
1381 Simplified Chinese PC-Data mixed (IBM GB)
1383 Simplified Chinese EUC
1386 Simplified Chinese PC-Data GBK
1388 Simplified Chinese EBCDIC (MBCS)
1390 Extended Japanese Katakana-Kanji (Extended SBCS)
1392 Simplified Chinese PC-Data mixed for GB18030
1399 Extended Japanese Latin-Kanji (Extended SBCS)
4930 Korean (Extended DBCS)
4933 Simplified Chinese EBCDIC
4971 Greek EBCDIC
5026 Japanese Katakana EBCDIC
5035 Japanese English EBCDIC
5050 Japanese EUC
5123 Japanese Latin (Extended SBCS)
5347 MS-Windows Cyrillic
5488 Simplified Chinese PC-Data mixed (fixed) for GB18030
8482 Japanese Katakana
8612 Arabic EBCDIC
9005 Greek ISO 8859-7:2003
9030 Thai (Extended SBCS)
12712 Hebrew EBCDIC
13121 Korean (Extended SBCS)
13124 Simplified Chinese EBCDIC
13488 Unicode UTF-16
16684 Extended Japanese Latin (DBCS)
16804 Arabic EBCDIC
28709 Traditional Chinese EBCDIC

DISCLAIMER

The information in technical documents comes without any warranty or applicability for a specific purpose. The author(s) or distributor(s) will not accept responsibility for any damage incurred directly or indirectly through use of the information contained in these documents. The instructions may need to be modified to be appropriate for the hardware and software that has been installed and configured within a particular organization.  The information in technical documents should be considered only as an example and may include information from various sources, including IBM, Microsoft, and other organizations.