Page 1 of 2

How to insert Chinese characters in Vertica

Posted: Fri Jun 28, 2013 11:19 am
by jagadeesh
Hi,

I am facing some problem while loading data to vertica, my source system sends me some chinese letters and this is being rejected to insert in vertica.

This there any special datatype i have to define for the specific column?

Thanks,
Jagadeesh.

Re: How to insert Chinese characters in Vertica

Posted: Fri Jun 28, 2013 4:04 pm
by jagadeesh
I am using Informatica to load and in the session log i could see some chineese characters and i see a message bad record so i guess it should be because of these special characters

Re: How to insert Chinese characters in Vertica

Posted: Fri Jun 28, 2013 6:50 pm
by jagadeesh
is there any special datatype for non-ASCII data?

Re: How to insert Chinese characters in Vertica

Posted: Sat Jun 29, 2013 3:35 pm
by JimKnicely
Hi,

There is not a special data type in Vertica to store Chinese characters. All data in Vertica is stored using UTF8 encoding. Just make sure the source data is also encoded in UTF8.

Here is an example.

I created a text file named chinese.txt using Windows Notepad. The file contains the three Chinese characters that can be translated to "I Love You" in English.

They are: 我爱你

I made sure to save the file in Notepad using UTF8 encoding.

Next, I transferred the file to the first node on my Vertica cluster.

The Linux file command can be used to verify that the file's encoding method is UTF8:

Code: Select all

[dbadmin@vertica01 ~]$ file chinese.txt
chinese.txt: UTF-8 Unicode text, with no line terminators
Notice that I cat the file, I'll get a bunch of garbage because my terminal isn't set up to display the characters correctly. But that's okay.

Code: Select all

[dbadmin@vertica01 ~]$ cat chinese.txt
æç±ä½ 
Next I created a table in Vertica named chinese having one varhcar column and then loaded the data from the chinese.txt file into it:

Code: Select all

dbadmin=> create table chinese (c1 varchar(100));
CREATE TABLE

dbadmin=> copy chinese from '/home/dbadmin/chinese.txt';
Rows Loaded
-------------
           1
(1 row)

dbadmin=> select * from chinese;
   c1
--------
æç±ä½ 
(1 row)
I still get the garbage output from my SQL statement.

But the good news, the data in the table is fine.

If I use my dnVisualizer client to query the table, I see the Chinese characters just fine!
dbvis_chinese.png
dbvis_chinese.png (109.77 KiB) Viewed 28325 times
Note that I had to change the grid font in dbVisualizer to "Arial Unicode MS" to display the Chinese characters correctly...

The point is Vertica can store Chinese characters in UTF8 in a varchar data field :!:

Re: How to insert Chinese characters in Vertica

Posted: Mon Jul 01, 2013 7:33 am
by nnani
I never knew about it..

Thanks for sharing that knicely.

Re: How to insert Chinese characters in Vertica

Posted: Mon Jul 01, 2013 1:24 pm
by billykopecki
That's pretty cool! Jim, how do you save a file in Notepad as UTF8?

Re: How to insert Chinese characters in Vertica

Posted: Mon Jul 01, 2013 2:20 pm
by nnani
Hi billy,

Even I am scratching my head on this.
I didn't see any option while saving. :roll: