StarQuest Technical Documents

StarSQL Character Conversion and National Language Support

Last Update: 08 March 2011
Product: StarSQL for Linux, UNIX, and Windows (ODBC driver)
Version: v5.3 and later
Article ID: SQV00SQ015

Abstract

The document describes the format of the file that StarSQL for Linux, UNIX, and Windows (the ODBC driver) uses to look up CCSID values and determine which character conversion routine to use for converting character data between disparate systems, and lists the CCSIDs that are currently supported by StarSQL.

Solution

StarSQL uses a data-driven architecture to support character conversions, which allows support for specific languages and character encoding schemes to be added without modifying the StarSQL source code. You can specify particular Coded Character Set Identifiers (CCSIDs) to use by setting the TypDefOver settings of the data source definition that StarSQL uses to connect to the host. Refer to the documentation for your version of StarSQL for details about customizing the StarSQL driver data source.

The Conversion Table

StarSQL performs inbound data conversion from the host system based upon the conversions that are defined in the ccsid.csv table that is installed with StarSQL. The ccsid.csv table is platform-specific, and is installed to the \Programs\StarSQL directory of a Windows-based computer or to the $STARSQL/etc/conf subdirectory of a Linux- or UNIX-based computer. The format of the CCSID.CSV table is as follows:

Column 1	Column 2	Column 3	Column 4	Column 5	Column 6
CCSID	'A' for ASCII, 'E' for EBCDIC, 'U' for Unicode	'S' (for SBCS) 'M' (for MBCS) 'G' (for Graphic)	mapping for the iconv codeset to use for the CCSID	client locale codeset name if different from iconv codeset	an optional, single-byte CCSID to associate with a multi-byte CCSID

Contact StarQuest Customer Support to request that additional languages or codesets be added to the ccsid.csv table and conversion routines.

StarSQL uses the ccsid.csv table and conversion routines to convert all inbound data to characters defined by the local code page of the client computer, and to convert single-byte outbound data as necessary to match the CCSID expected by DB2.

StarSQL and DB2 communicate which CCSIDs that they plan to use in the TYPDEFOVR parameters that are sent at connect time. If you encounter a problem when connecting or sending data, review the error messages for information about an unsupported CCSID or an invalid CCSID in the TYPDEFOVR setting.

Supported CCSIDs

StarSQL supports converting character data between a wide range of CCSID-to-CCSID pairs and CCSID-to-code-page pairs. It supports the Group 1, Group 1A, Group 2, and Unicode character sets as defined by CDRA (IBM Character Definition Representation Architecture)..

Group 1 covers the Roman Alphabet Number 1, which includes Australia, Hong Kong, New Zealand, North and South America, and Western Europe.
Group 1A covers multilingual scripts Cyrillic, Hebrew, Greek, and Turkish. The Latin 2 character set associated with Central Europe is supported in this group.
Group 2 covers double-byte coding for Japan, Korea, the People’s Republic of China, the Republic of China, and Thailand.

The following table lists the CCSIDs that StarSQL currently supports. Some CCSIDs may not be available on all platforms.

CCSID	Description
037	Europe EBCDIC (Australia, Brazil, Canada, Netherlands, New Zealand, Portugal)
256	Netherlands EBCDIC
273	Austria, Germany EBCDIC
277	Denmark, Norway EBCDIC
278	Finland, Sweden EBCDIC
280	Italian EBCDIC
284	Spanish EBCDIC
285	United Kingdom EBCDIC
290	Japanese EBCDIC (SBCS)
297	French EBCDIC
300	Japanese EBCDIC (DBCS)
301	Japanese PC-Data (DBCS including 1880 UDC)
367	US ANSI X3.4 ASCI
420	Arabic EBCDIC
423	Greek EBCDIC
424	Hebrew EBCDIC
437	USA PC-Data
500	International EBCDIC
813	ISO 8859-7 ASCII
819	ISO 8859-1 ASCII (Latin Alphabet)
833	Korean EBCDIC
834	Korean EBCDIC (DBCS)
835	Traditional Chinese EBCDIC (DBCS)
836	Simplified Chinese EBCDIC (extended SBCS)
837	Simplified Chinese EBCDIC (MBCS)
838	Thailand EBCDIC
850	PC-Data MLP 222 Latin-1
856	Hebrew PC-Data
866	Cyrillic PC-Data
870	Latin-2 EBCDIC
871	Iceland EBCDIC
874	Thai PC-Data
875	Greek EBCDIC
878	Kois-Russian Cyrillic
880	Cyrillic EBCDIC
895	Japanese (7-bit Latin)
897	Japanese PC-Data (SBCS)
905	Turkey EBCDIC
912	ISO 8859-2 ASCII
913	ISO 8859-3 ASCII
914	ISO 8859-4 ASCII
915	ISO 8859-5 ASCII
916	ISO 8859-8 ASCII
918	Urdu EBCDIC
920	ISO 8859-9 ASCII
921	ISO 8859-13 ASCII
923	ISO 8859-15 ASCII
924	Latin 9 EBCDIC
930	Japan EBCDIC (MBCS)
932	Japan PC-Data (MBCS)
933	Korea EBCDIC (MBCS)
935	Simplified Chinese EBCDIC (MBCS)
936	Simplified Chinese PC-Data (SBCS)
937	Traditional Chinese EBCDIC (SBCS)
938	Traditional Chinese PC-Data (MBCS)
939	Japan EBCDIC (MBCS)
943	Japan PC-Data (MBCS) for Open environment
949	Korea PC-Data (MBCS)
950	Traditional Chinese PC-Data (mixed for IBM BIG-5)
951	IBM KS PC-Data (MBCS)
954	Japanese EUC
964	Traditional Chinese EUC
970	Korean EUC
1025	Cyrillic EBCDIC
1026	Turkey Latin-5 EBCDIC
1027	Japan Latin EBCDIC
1041	Japan PC-Data
1046	Arabic PC-Data
1047	Latin Open System EBCDIC
1051	HP emulation
1088	Korea KS PC-Data
1089	Arabic ISO 8859-6
1097	Farsi EBCDIC
1112	Baltic EBCDIC
1122	Estonia EBCDIC
1123	Ukraine EBCDIC
1130	Vietnamese EBCDIC
1132	Lao EBCDIC
1140	COM Europe ECECP
1141	Austria, Germany ECECP
1142	Denmark, Norway ECECP
1143	Finland, Sweden ECECP
1144	Italian ECECP
1145	Spanish ECECP
1146	United Kingdom ECECP
1147	French ECECP
1148	International ECECP
1149	Iceland ECECP
1153	Latin-2 EBCDIC
1154	Cyrillic EBCDIC
1155	Turkey Latin-5 with euro
1156	Baltic, Multilingual with euro
1157	Estonia EBCDIC
1160	Thai EBCDIC (SBCS)
1161	Thai PC-Data (SBCS)
1167	Kois Russian
1168	Kois Ukrainian
1200	UTF-16 Big Endian with IBM PUA
1208	UTF-8 with IBM PUA
1250	MS-Windows Latin-2
1251	MS-Windows Cyrillic
1252	MS-Windows Latin-1
1253	MS-Windows Greek
1254	MS-Windows Turkey
1255	MS-Windows Hebrew
1256	MS-Windows Arabic
1257	MS-Windows Baltic
1258	MS-Windows Vietnamese
1363	MS-Windows Korean
1364	Korean mixed Extended
1375	Big-5 extension for HKSCS (MBCS)
1381	Simplified Chinese PC-Data mixed (IBM GB)
1383	Simplified Chinese EUC
1386	Simplified Chinese PC-Data GBK
1388	Simplified Chinese EBCDIC (MBCS)
1390	Extended Japanese Katakana-Kanji (Extended SBCS)
1392	Simplified Chinese PC-Data mixed for GB18030
1399	Extended Japanese Latin-Kanji (Extended SBCS)
4930	Korean (Extended DBCS)
4933	Simplified Chinese EBCDIC
4971	Greek EBCDIC
5026	Japanese Katakana EBCDIC
5035	Japanese English EBCDIC
5050	Japanese EUC
5123	Japanese Latin (Extended SBCS)
5347	MS-Windows Cyrillic
5488	Simplified Chinese PC-Data mixed (fixed) for GB18030
8482	Japanese Katakana
8612	Arabic EBCDIC
9005	Greek ISO 8859-7:2003
9030	Thai (Extended SBCS)
12712	Hebrew EBCDIC
13121	Korean (Extended SBCS)
13124	Simplified Chinese EBCDIC
13488	Unicode UTF-16
16684	Extended Japanese Latin (DBCS)
16804	Arabic EBCDIC
28709	Traditional Chinese EBCDIC

DISCLAIMER

The information in technical documents comes without any warranty or applicability for a specific purpose. The author(s) or distributor(s) will not accept responsibility for any damage incurred directly or indirectly through use of the information contained in these documents. The instructions may need to be modified to be appropriate for the hardware and software that has been installed and configured within a particular organization. The information in technical documents should be considered only as an example and may include information from various sources, including IBM, Microsoft, and other organizations.