How can I Improve performance when loading data by copy?

Moderator: NorbertKrupa

id10t
GURU
GURU
Posts: 732
Joined: Mon Apr 16, 2012 2:44 pm

Re: how can I Improve performance when load date by cpoy

Post by id10t » Wed Aug 28, 2013 10:55 am

Hi!

To achieve parallelism you don't have to split uncompressed CSV file.
Using the Parallel Load Library

You can use the HP Vertica PloadDelimitedSource library with the COPY statement to parallel load delimited files. This section refers to the functionality as Pload.

The Pload feature is ideal for loading very large data files (at least 10 GB). The library divides file parsing tasks across each core on the server node where the data file resides, significantly reducing file load time. For example, given a 12GB file to load, performance could be 3 - 5 times faster than loading the file without Pload.

You can specify the file division size (in bytes) by supplying an integer value to the pload chunk_size parameter.

After installing the library, you can use the COPY statement's WITH SOURCE PloadDelimitedSource parameter. Using Pload with COPY LOCAL is not supported.

muaythai_duan
Newbie
Newbie
Posts: 18
Joined: Sun Aug 25, 2013 11:43 am

Re: how can I Improve performance when load date by cpoy

Post by muaythai_duan » Wed Aug 28, 2013 12:53 pm

hi sKwa:
where can I download the library and the document?thanks!

id10t
GURU
GURU
Posts: 732
Joined: Mon Apr 16, 2012 2:44 pm

Re: How can I Improve performance when loading data by copy?

Post by id10t » Wed Aug 28, 2013 2:48 pm

Hi!

Did you read this (I provided a link in my prev post also)? You have to read this, because there are explanation how to use and how to INSTALL !!!

muaythai_duan
Newbie
Newbie
Posts: 18
Joined: Sun Aug 25, 2013 11:43 am

Re: How can I Improve performance when loading data by copy?

Post by muaythai_duan » Wed Aug 28, 2013 4:36 pm

I will read this web site,and pay more attention to practice.thank you!

User avatar
nnani
Master
Master
Posts: 302
Joined: Fri Apr 13, 2012 6:28 am
Contact:

Re: How can I Improve performance when loading data by copy?

Post by nnani » Thu Aug 29, 2013 7:05 am

Thanks for that skwa,

Learnt something new. :)
nnani........
Long way to go

You can check out my blogs at vertica-howto

muaythai_duan
Newbie
Newbie
Posts: 18
Joined: Sun Aug 25, 2013 11:43 am

Re: How can I Improve performance when loading data by copy?

Post by muaythai_duan » Thu Aug 29, 2013 11:51 pm

hi,nnani

Code: Select all

\set t_pwd `pwd`
\set input_file '''':t_pwd'/*.csv'''
copy test from :input_file1 nodename delimiter ',' direct ;
copy test from :input_file1 nodename delimiter ',' direct ;
copy test from :input_file3 nodename delimiter ',' direct ;
if files on only one node,the command is executed in turn,is not parallel.I am sure.

id10t
GURU
GURU
Posts: 732
Joined: Mon Apr 16, 2012 2:44 pm

Re: How can I Improve performance when loading data by copy?

Post by id10t » Fri Aug 30, 2013 6:41 am

Hi!

"vsql script" always executed as sequential, even if data will be on different nodes. VSQL can't execute task in background.

Post Reply

Return to “Vertica Data Load”