copy load data that may contain the delimiter

Moderator: NorbertKrupa

Post Reply
kobymeir
Newbie
Newbie
Posts: 1
Joined: Fri Jun 19, 2015 9:07 pm

copy load data that may contain the delimiter

Post by kobymeir » Mon Jun 29, 2015 2:19 pm

I'm currently using copy command to load data into vertica.
I volumes of 1TB gziped data daily.

copy table (col1,col2,col3) FROM 1.dat GZIP DELIMITER '|'

My problem is that my data that goes into the columns may contain the delimiter '|' or any other asscii character.
From the documentation there is an option to use ESCAPE AS '~' which mean i will need to escape any data in column that may contain '|' and change it to '~|'
my only question is what happend if my data also has '~|' before i escape it? should i change it to '~~|' or any other alternative?
Would using ENCLOSED BY will help me with it? and what do i need to do if my data contains the ENCLOSED BY chatacter?
I prefer not to use the ENCLOSED BY as it will increase the data transfer of my files, but if that's the best solution i will reconsider.

example for raw data in my system:
(the number represents the column that data belongs to)
1|2|2|3
1|2|3
1|2~|2|3
1|2"2|3
||
|2|3
|2|
|2|3
1||
1||3
1|2|
1|2|3

In all of the above cases the original data contains 3 columns of raw data before escaping it.

Thanks for the help.
Koby

User avatar
JimKnicely
Site Admin
Site Admin
Posts: 1825
Joined: Sat Jan 21, 2012 4:58 am
Contact:

Re: copy load data that may contain the delimiter

Post by JimKnicely » Mon Jun 29, 2015 3:14 pm

Hmm. I wonder if you could try something like this?

Code: Select all

dbadmin=> create table test (col1 varchar(10), col2 varchar(10), col3 varchar(10), col4 varchar(10)) segmented by hash(col1, col2, col3) all nodes;
CREATE TABLE

dbadmin=> copy test from '/home/dbadmin/test.txt' delimiter '|' trailing nullcols ;
 Rows Loaded
-------------
          11
(1 row)

dbadmin=> select * from test order by col1, col2;
 col1 | col2 | col3 | col4
------+------+------+------
 1    | 2    | 2    | 3
 1    | 2    | 3    |
 1    | 2    |      |
 1    | 2"2  | 3    |
 1    | 2~   | 2    | 3
 1    |      | 3    |
 1    |      |      |
      | 2    | 3    |
      | 2    | 3    |
      | 2    |      |
      |      |      |
(11 rows)

dbadmin=> update test
dbadmin->    set col2 = (case when col4 is not null then col2 || '|' || col3 else col2 end),
dbadmin->        col3 = nvl2(col4, col4, col3);
 OUTPUT
--------
     11
(1 row)

dbadmin=> alter table test drop column col4 cascade;
ALTER TABLE

dbadmin=> select * from test order by col1, col2;
 col1 | col2 | col3
------+------+------
 1    | 2    | 3
 1    | 2    |
 1    | 2"2  | 3
 1    | 2|2  | 3
 1    | 2~|2 | 3
 1    |      | 3
 1    |      |
      | 2    | 3
      | 2    | 3
      | 2    |
      |      |
(11 rows)
An alternate solution could be to load into a separate table and then copy the cleansed data over to your real table:

Code: Select all

dbadmin=> drop table test;
DROP TABLE

dbadmin=> create table test (col1 varchar(10), col2 varchar(10), col3 varchar(10)) segmented by hash(col1, col2, col3) all nodes;
CREATE TABLE

dbadmin=> create table test_load (col1 varchar(10), col2 varchar(10), col3 varchar(10), col4 varchar(10)) segmented by hash(col1, col2, col3) all nodes;
CREATE TABLE

dbadmin=> copy test_load from '/home/dbadmin/test.txt' delimiter '|' trailing nullcols ;
 Rows Loaded
-------------
          11
(1 row)

dbadmin=> insert /*+ direct */ into test
dbadmin-> select col1, (case when col4 is not null then decode(ascii(col2), null, col2 || col3, col2 || '|' || col3) else col2 end),
dbadmin->        nvl2(col4, col4, col3)
dbadmin->   from test_load;
 OUTPUT
--------
     11
(1 row)

dbadmin=> select * from test order by 1, 2;
 col1 | col2 | col3
------+------+------
 1    | 2    | 3
 1    | 2    |
 1    | 2"2  | 3
 1    | 2|2  | 3
 1    | 2~|2 | 3
 1    |      | 3
 1    |      |
      | 2    | 3
      | 2    | 3
      | 2    |
      |      |
(11 rows)

dbadmin=> truncate table test_load;
TRUNCATE TABLE
Jim Knicely

Image

Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.

Post Reply

Return to “Vertica Database Development”