How to insert Chinese characters in Vertica

Moderator: NorbertKrupa

jagadeesh
Newbie
Newbie
Posts: 21
Joined: Tue Feb 05, 2013 9:36 am

How to insert Chinese characters in Vertica

Post by jagadeesh » Fri Jun 28, 2013 11:19 am

Hi,

I am facing some problem while loading data to vertica, my source system sends me some chinese letters and this is being rejected to insert in vertica.

This there any special datatype i have to define for the specific column?

Thanks,
Jagadeesh.

jagadeesh
Newbie
Newbie
Posts: 21
Joined: Tue Feb 05, 2013 9:36 am

Re: How to insert Chinese characters in Vertica

Post by jagadeesh » Fri Jun 28, 2013 4:04 pm

I am using Informatica to load and in the session log i could see some chineese characters and i see a message bad record so i guess it should be because of these special characters

jagadeesh
Newbie
Newbie
Posts: 21
Joined: Tue Feb 05, 2013 9:36 am

Re: How to insert Chinese characters in Vertica

Post by jagadeesh » Fri Jun 28, 2013 6:50 pm

is there any special datatype for non-ASCII data?

User avatar
JimKnicely
Site Admin
Site Admin
Posts: 1825
Joined: Sat Jan 21, 2012 4:58 am
Contact:

Re: How to insert Chinese characters in Vertica

Post by JimKnicely » Sat Jun 29, 2013 3:35 pm

Hi,

There is not a special data type in Vertica to store Chinese characters. All data in Vertica is stored using UTF8 encoding. Just make sure the source data is also encoded in UTF8.

Here is an example.

I created a text file named chinese.txt using Windows Notepad. The file contains the three Chinese characters that can be translated to "I Love You" in English.

They are: 我爱你

I made sure to save the file in Notepad using UTF8 encoding.

Next, I transferred the file to the first node on my Vertica cluster.

The Linux file command can be used to verify that the file's encoding method is UTF8:

Code: Select all

[dbadmin@vertica01 ~]$ file chinese.txt
chinese.txt: UTF-8 Unicode text, with no line terminators
Notice that I cat the file, I'll get a bunch of garbage because my terminal isn't set up to display the characters correctly. But that's okay.

Code: Select all

[dbadmin@vertica01 ~]$ cat chinese.txt
æç±ä½ 
Next I created a table in Vertica named chinese having one varhcar column and then loaded the data from the chinese.txt file into it:

Code: Select all

dbadmin=> create table chinese (c1 varchar(100));
CREATE TABLE

dbadmin=> copy chinese from '/home/dbadmin/chinese.txt';
Rows Loaded
-------------
           1
(1 row)

dbadmin=> select * from chinese;
   c1
--------
æç±ä½ 
(1 row)
I still get the garbage output from my SQL statement.

But the good news, the data in the table is fine.

If I use my dnVisualizer client to query the table, I see the Chinese characters just fine!
dbvis_chinese.png
dbvis_chinese.png (109.77 KiB) Viewed 28129 times
Note that I had to change the grid font in dbVisualizer to "Arial Unicode MS" to display the Chinese characters correctly...

The point is Vertica can store Chinese characters in UTF8 in a varchar data field :!:
Jim Knicely

Image

Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.

User avatar
nnani
Master
Master
Posts: 302
Joined: Fri Apr 13, 2012 6:28 am
Contact:

Re: How to insert Chinese characters in Vertica

Post by nnani » Mon Jul 01, 2013 7:33 am

I never knew about it..

Thanks for sharing that knicely.
nnani........
Long way to go

You can check out my blogs at vertica-howto

billykopecki
Beginner
Beginner
Posts: 42
Joined: Thu Apr 19, 2012 9:03 pm

Re: How to insert Chinese characters in Vertica

Post by billykopecki » Mon Jul 01, 2013 1:24 pm

That's pretty cool! Jim, how do you save a file in Notepad as UTF8?

User avatar
nnani
Master
Master
Posts: 302
Joined: Fri Apr 13, 2012 6:28 am
Contact:

Re: How to insert Chinese characters in Vertica

Post by nnani » Mon Jul 01, 2013 2:20 pm

Hi billy,

Even I am scratching my head on this.
I didn't see any option while saving. :roll:
nnani........
Long way to go

You can check out my blogs at vertica-howto

Post Reply

Return to “New to Vertica Database Development”