Loads into pre-join projections of 2 fact tables don't scale

Forum to discuss Vertica performance tuning

Moderator: norbertk

JakaJancar
Newbie
Newbie
Posts: 5
Joined: Wed Jan 30, 2013 1:27 pm

Loads into pre-join projections of 2 fact tables don't scale

Postby JakaJancar » Wed Jan 30, 2013 1:41 pm

Hello,

We are trying to create a pre-join projection that joins two large fact tables.

The problem is that INSERTs are slow: they take O(existing_data), not O(added_data).

In practice, this means that after importing a couple of days worth of data, they become prohibitively slow.

The projections are written so that:

- INSERT uses MERGE JOINs only.
- Data for INSERT is presorted, i.e. no (SORT ON JOIN KEY).
- When running on a multi-node cluster, there is no (RESEGMENT).
- There's an extra batch_id column, that serves as a bucket, as per this help page.

Doing a SELECT .. JOIN .. for what I think Vertica has to do when populating the projections *is fast*, but INSERT .. SELECT is slow.

The following script demonstrates the issue:

https://dl.dropbox.com/u/41764/vertica/ ... ng-test.sh

The script will:

Code: Select all

 1. create a schema with 2 tables, `a` and `b`
 2. create "normal" projections for `a` and prejoin for `b` x `a`
 3. repeat 100 times
     3.1. load 1M rows into `a_temp`
     3.2. load 1M rows into `b_temp`
     3.3. insert into a select * from a_temp
     3.4. insert into b select * from b_temp


You will see that the times for #3.3 are constant, but the times for #3.4 grow with the number of *existing*, not *inserted* rows. Here is a chart:

Image

Has anyone had success with pre-join projections over 2 large tables?

Best,
Jaka

Sponsored Links

Advertisements

Advertisements
 

User avatar
becky
Intermediate
Intermediate
Posts: 117
Joined: Sat Apr 28, 2012 11:37 am

Re: Loads into pre-join projections of 2 fact tables don't s

Postby becky » Wed Jan 30, 2013 7:30 pm

Hi Jaka,

Nice write up!

I've never had any luck using INSERT statements (> 1000) into any Vertica tables, regardless of the type of projections the table has. Have you tried using the COPY command? I had a case open with Vertica a while back and the guy told me to always use the COPY command. Another option is have you tried not having the foreien key between the two fact tables. By the way, are you using a snow flake schema?

I'm sure someone on this board can help provide better advice for you!
THANKS - BECKSTER

JakaJancar
Newbie
Newbie
Posts: 5
Joined: Wed Jan 30, 2013 1:27 pm

Re: Loads into pre-join projections of 2 fact tables don't s

Postby JakaJancar » Wed Jan 30, 2013 8:08 pm

Hi Becky,

Similarly to how it's done in the example reproduction script, we

1. insert into a temp table, using JDBC batch inserts (which under the hood should get converted to COPY), then
2. do "insert into final_table select * from temp_table".

Indeed we have not tried doing COPY directly into the final table, for other reasons, but I'm sceptical that it would perform any better...

As for removing the foreign key, you need that to create the pre-join projection.

Our schema consists of multiple fact tables that contain nested/related facts, which you want to join together (in addition to dimension tables).

User avatar
Julie
Master
Master
Posts: 207
Joined: Thu Apr 19, 2012 9:29 pm

Re: Loads into pre-join projections of 2 fact tables don't s

Postby Julie » Wed Jan 30, 2013 9:18 pm

FYI ..

This is a reponse I got from Vertica support about INSERT vs. COPY command:

In the present release of vertica inserts and updates are working as designed. If an insert statement takes about x milli seconds then running it y times serially takes approximately x*y milli seconds. But if we run multiple insert statement parallel based on the resources available on the system it takes the time.

If there are any possibilities of batching up the data and doing bulk load with copy is the preferred method over individual inserts to get good performance.

Basically, insert statements suck in Vertica :mrgreen:

If you still want to use inserts, we found that separating the inserts on different nodes so that they run in parallel helped with performance.
Thanks,
Juliette

zvika
Beginner
Beginner
Posts: 25
Joined: Thu Apr 19, 2012 7:55 am

Re: Loads into pre-join projections of 2 fact tables don't s

Postby zvika » Thu Jan 31, 2013 3:56 pm

Julie wrote:FYI ..


Basically, insert statements suck in Vertica :mrgreen:

If you still want to use inserts, we found that separating the inserts on different nodes so that they run in parallel helped with performance.



This is interesting as Updates also sucks .... :-)


Return to “Vertica Performance Tuning”

Who is online

Users browsing this forum: No registered users and 1 guest