Luke Konecki Luke Konecki - 1 month ago 10
C# Question

OdbcConnection returning Chinese Characters as "?"

I have an Oracle database that stores some data values in Simplified Chinese. I have created an ASP.net MVC C# webpage that is supposed to display this information. I am using a

OdbcConnection
in order to retrieve the data, however when I run my
da.Fill(t)
command the values return as "?"

OdbcCommand cmd = new OdbcCommand();
cmd.CommandText = select;

OdbcConnection SqlConn = new OdbcConnection("Driver={Oracle in instantclient_11_2};Dbq=Database;Uid=Username;pwd=password;");
DataTable t = new DataTable();
cmd.Connection = SqlConn;

SqlConn.Open();
OdbcDataAdapter da = new OdbcDataAdapter(cmd);
SqlConn.Close();
da.Fill(t);
return t;


t
has the data but everything that is supposed to be the Chinese characters is just a series of "?????"

Answer

Problems with character set are quite common, let me try to give some general notes.

In principle you have to consider four different character set settings.

1 and 2: NLS_CHARACTERSET and NLS_NCHAR_CHARACTERSET

Example: AL32UTF8

They are defined only on your database, you can interrogate them with

    SELECT * 
    FROM V$NLS_PARAMETERS 
    WHERE PARAMETER IN ('NLS_CHARACTERSET', 'NLS_NCHAR_CHARACTERSET');

These settings define which characters (in which format) can be stored in your database - no more, no less. It requires some effort if you have to make a Character Set Migration on existing database.

3: NLS_LANG

Example: AMERICAN_AMERICA.AL32UTF8

This value is defined only on your client. NLS_LANG has nothing to do with the ability to store characters in a database. It is used to let Oracle know what character set you are using on the client side (e.g. character set of your terminal or the encoding of your .sql files).

NLS_LANG can be defined by environment variable NLS_LANG or by Windows Registry at HKLM\SOFTWARE\Wow6432Node\ORACLE\KEY_%ORACLE_HOME_NAME%\NLS_LANG (for 32 bit), resp. HKLM\SOFTWARE\ORACLE\KEY_%ORACLE_HOME_NAME%\NLS_LANG (for 64 bit). If NLS_LANG value is not provided then Oracle defaults it to AMERICAN_AMERICA.US7ASCII

Format of NLS_LANG is NLS_LANG=language_territory.charset. The {charset} part of NLS_LANG is not shown in any system table or view. All components of the NLS_LANG definition are optional, so following definitions are all valid: NLS_LANG=.WE8ISO8859P1, NLS_LANG=_GERMANY, NLS_LANG=AMERICAN, NLS_LANG=ITALIAN_.WE8MSWIN1252, NLS_LANG=_BELGIUM.US7ASCII.

Find more very useful information here: NLS_LANG FAQ

4: The "real" character set of your terminal, your application or the encoding of .sql files

Example: UTF-8

If you work on a Windows terminal (i.e. with SQL*plus) you can interrogate the codepage with command chcp, on Unix/Linux the equivalent is locale charmap or echo $LANG.

If you work with .sql files and an editor like TOAD or SQL-Developer you have to check the save options. Typically you can choose values like UTF-8, ANSI, ISO-8859-1, etc. ANSI means the Windows ANSI codepage, typically CP1252, you can check in your Registry at HKLM\SYSTEM\ControlSet001\Control\Nls\CodePage\ACP or here: National Language Support (NLS) API Reference

Note: Depending on your technology you don't have to care so much about this topic, some examples:

  • ODP.NET (unmanged driver) automatically inherits the character set from NLS_LANG value.

  • ODP.NET Managed Driver is not NLS_LANG sensitive. It is only .NET locale sensitive. (see Data Provider for .NET Developer's Guide)

  • OraOLEDB (from Oracle) always use UTF-16 (see OraOLEDB Provider Specific Features)

  • Java based JDBC has its own methods to deal with character sets (but I am not familiar with any details)

How to set all these values?

The most important point is to match NLS_LANG and your "real" character set of your terminal, resp. application or the encoding of your .sql files

Some common pairs are:

  • CP850 -> WE8PC850

  • CP1252 or ANSI -> WE8MSWIN1252

  • ISO-8859-1 -> WE8ISO8859P1

  • ISO-8859-15 -> WE8ISO8859P15

  • UTF-8 -> AL32UTF8

Or run this query to get some more:

SELECT VALUE AS ORACLE_CHARSET, UTL_I18N.MAP_CHARSET(VALUE) AS IANA_NAME
FROM V$NLS_VALID_VALUES
WHERE PARAMETER = 'CHARACTERSET';

Is it required to set client NLS_LANG value equal to database NLS_CHARACTERSET value?

No, not necessarily! Actually, if you consider that the database has two values (NLS_CHARACTERSET and NLS_NCHAR_CHARACTERSET) which are usually different, it is even not possible because on your client you have only one value.

For example, if you have the database character set NLS_CHARACTERSET=AL32UTF8 and the client character set NLS_LANG=.ZHS32GB18030 then it will work without any problem (provided your client really uses GB18030), although these character sets are completely different. GB18030 is a character set commonly used for Chinese, like UTF-8 it supports all Unicode characters.

If you have, for example NLS_CHARACTERSET=AL32UTF8 and NLS_LANG=.WE8ISO8859P1 it will also work (again, provided your client really uses ISO-8859-P1). However, the database may store characters which your client is not able to display, instead the client will display a placeholder (e.g. ¿).

Anyway, it is beneficial to have matching NLS_LANG and NLS_CHARACTERSET values, if suitable. If they are equal you can be sure that any character which may be stored in database can also be displayed and any character you enter in your terminal or write in your .sql file can also be stored in database and is not substituted by placeholder.

Supplement

So many times you can read advise like "The NLS_LANG character set must be the same as your database character set" (also here on SO). This is simply not true!

Here is the proof:

C:\>set NLS_LANG=.AL32UTF8

C:\>sqlplus ...

SQL> SET SERVEROUTPUT ON
SQL> DECLARE
  2  CharSet VARCHAR2(20);
  3  BEGIN
  4     SELECT VALUE INTO Charset FROM nls_database_parameters WHERE parameter = 'NLS_CHARACTERSET';
  5     DBMS_OUTPUT.PUT_LINE('Database NLS_CHARACTERSET is '||Charset);
  6     IF UNISTR('\20AC') = '€' THEN
  7             DBMS_OUTPUT.PUT_LINE ( '"€" is equal to U+20AC' );
  8     ELSE
  9             DBMS_OUTPUT.PUT_LINE ( '"€" is not the same as U+20AC' );
 10     END IF;
 11  END;
 12  /

Database NLS_CHARACTERSET is AL32UTF8
"€" is not the same as U+20AC

PL/SQL procedure successfully completed.

Both, client and database character sets are AL32UTF8, however the characters do not match. The reason is, my cmd.exe and thus also SQL*Plus use Windows CP1252. Therefore I must set NLS_LANG accordingly:

C:\>chcp
Active code page: 1252

C:\>set NLS_LANG=.WE8MSWIN1252

C:\>sqlplus ...

SQL> SET SERVEROUTPUT ON
SQL> DECLARE
  2  CharSet VARCHAR2(20);
  3  BEGIN
  4     SELECT VALUE INTO Charset FROM nls_database_parameters WHERE parameter = 'NLS_CHARACTERSET';
  5     DBMS_OUTPUT.PUT_LINE('Database NLS_CHARACTERSET is '||Charset);
  6     IF UNISTR('\20AC') = '€' THEN
  7             DBMS_OUTPUT.PUT_LINE ( '"€" is equal to U+20AC' );
  8     ELSE
  9             DBMS_OUTPUT.PUT_LINE ( '"€" is not the same as U+20AC' );
 10     END IF;
 11  END;
 12  /

Database NLS_CHARACTERSET is AL32UTF8
"€" is equal to U+20AC

PL/SQL procedure successfully completed.