COPY command has ETL capabilities

Moderator: NorbertKrupa

Post Reply
User avatar
JimKnicely
Site Admin
Site Admin
Posts: 1825
Joined: Sat Jan 21, 2012 4:58 am
Contact:

COPY command has ETL capabilities

Post by JimKnicely » Wed May 02, 2012 3:52 am

The Vertica COPY command has ETL capabilities which can be very useful.

Example:

Code: Select all

    test_db=> CREATE TABLE IPTable (ip INTEGER);
    CREATE TABLE

Code: Select all

    test_db=> COPY IPTable (anFiller FILLER VARCHAR(15), ip AS INET_ATON(anFiller)) FROM STDIN;
Vertica is now waiting for data via the standard input to copy.

Enter the data to be copied followed by a newline and end with a backslash and a period on a line by itself.

Example:

Code: Select all

    >> 209.207.224.40
    >> 192.168.1.1
    >> \.

Code: Select all

    test_db=> select * from IPTable ;
         ip     
    ------------
     3520061480
     3232235777
    (2 rows)
I hope this example shows you how Vertica functions + FILLER = base ETL tool. Note that it better to store IPs as integers - take less disk space, faster for comparison, sorting.

PS. Thanks to vertica-forums user sKwa for this great information!
Jim Knicely

Image

Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.

seanarcher7
Newbie
Newbie
Posts: 7
Joined: Thu Apr 04, 2013 8:43 pm

Re: COPY command has ETL capabilities

Post by seanarcher7 » Fri Apr 12, 2013 1:28 pm

sKwa, knicely87,

Assume if a file used for bulk loading has 20 million records with 7 columns, then I would like to add 3 additional static auditing columns to all the 20 million records while loading using copy command. Is it possible to include these columns with copy command (Ofcourse table has 7+3 columns defined already). Note that the additional static columns will be a constant for all the 20 million records. 3 columns are load_dt, load_tm and filename

My question is following:
1) Can copy command handle this or
2) Should it be handled before loading the file in UNIX. This approach has a drop back because the load_dt and time do not reflect real time loading but the time the file is created.

Please provide your inputs/thoughts/suggestion.

id10t
GURU
GURU
Posts: 732
Joined: Mon Apr 16, 2012 2:44 pm

Re: COPY command has ETL capabilities

Post by id10t » Sat Apr 13, 2013 12:21 am

Hi!

1. Yes. See example.
2. No, you can define it in the COPY/INSERT statement.

Example

File (TAB as delimiter):

Code: Select all

$ cat /tmp/seanarcher_data.log
1	foo	bar
2	egg	baz
3	tux	zoo
Table DDL:

Code: Select all

CREATE TABLE public.seanarcher
(
    col1 int,
    col2 varchar(3),
    col3 varchar(3),
    LOAD_DT date,
    LOAD_TM time,
    SRC_FILE varchar(1024)
);
alter table seanarcher alter column LOAD_DT set default sysdate::DATE;
alter table seanarcher alter column LOAD_TM set default sysdate::TIME;
COPY statement:

Code: Select all

daniel=> copy seanarcher(col1, col2, col3, src_file as '/tmp/seanarcher_data.log') from '/tmp/seanarcher_data.log' direct delimiter e'\t';
 Rows Loaded 
-------------
           3
(1 row)
Validation:

Code: Select all

daniel=> select * from seanarcher ;
 col1 | col2 | col3 |  LOAD_DT   |     LOAD_TM     |         SRC_FILE         
------+------+------+------------+-----------------+--------------------------
    1 | foo  | bar  | 2013-04-13 | 02:20:21.265184 | /tmp/seanarcher_data.log
    2 | egg  | baz  | 2013-04-13 | 02:20:21.265184 | /tmp/seanarcher_data.log
    3 | tux  | zoo  | 2013-04-13 | 02:20:21.265184 | /tmp/seanarcher_data.log
(3 rows)

seanarcher7
Newbie
Newbie
Posts: 7
Joined: Thu Apr 04, 2013 8:43 pm

Re: COPY command has ETL capabilities

Post by seanarcher7 » Sun Apr 14, 2013 2:45 am

Thanks so much sKwa. This is great idea. But is it possible to not include c1 c2 and c3 as part of copy command but still achieve this?

id10t
GURU
GURU
Posts: 732
Joined: Mon Apr 16, 2012 2:44 pm

Re: COPY command has ETL capabilities

Post by id10t » Sun Apr 14, 2013 6:36 am

Hi!

No.

User avatar
nnani
Master
Master
Posts: 302
Joined: Fri Apr 13, 2012 6:28 am
Contact:

Re: COPY command has ETL capabilities

Post by nnani » Mon Jun 24, 2013 1:18 pm

Hello,

I have a scenario where in I am using a copy command to load a table.

The flat file just has one column which need to be loaded, the table has 5 columns need to be loaded.
The rest 4 columns are likewise

3 columns are hard coded and one column will be sysdate()

Can we do this. If yes, then how
nnani........
Long way to go

You can check out my blogs at vertica-howto

User avatar
nnani
Master
Master
Posts: 302
Joined: Fri Apr 13, 2012 6:28 am
Contact:

Re: COPY command has ETL capabilities

Post by nnani » Tue Jun 25, 2013 2:55 pm

Hello all,

I got the solution for this one.
It was very simple

Code: Select all

COPY n_tab( c1, c1 as '2', c3 as 'abc', c4 a '28', c5 as sysdate()) from 'path/to/file'
:lol:
nnani........
Long way to go

You can check out my blogs at vertica-howto

Post Reply

Return to “Vertica Data Load”