Add column, then efficiently fill it

Moderator: NorbertKrupa

Post Reply
dbmsuser7
Intermediate
Intermediate
Posts: 71
Joined: Tue Jan 28, 2014 3:03 am

Add column, then efficiently fill it

Post by dbmsuser7 » Tue Mar 04, 2014 5:49 pm

Suppose I add a column to a table that already exists, then want to fill the column by joining the original table against other tables. Is there an efficient way to do this?

Problem origin: we have a very large table created by joining a bunch of other tables. Sometimes we want to add a field that's in the join (which previously wasn't selected to be in the big table). We could drop the large table and start all over again, but the table is large enough that that's prohibitively expensive.

NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Re: Add column, then efficiently fill it

Post by NorbertKrupa » Tue Mar 04, 2014 6:37 pm

The way I see you doing this is something like:

Code: Select all

INSERT INTO tableA (col1, col2, col3)
SELECT a.col1, a.col2, b.col1
FROM tableA a
JOIN tableB b ...
The only thing I would suggest is to use /*+direct*/ to write directly to disk and bypass WOS.
Checkout vertica.tips for more Vertica resources.

dbmsuser7
Intermediate
Intermediate
Posts: 71
Joined: Tue Jan 28, 2014 3:03 am

Re: Add column, then efficiently fill it

Post by dbmsuser7 » Wed Mar 05, 2014 4:29 pm

That's the basic idea, though we're really doing an update, not an insert.

Of course, updates are extremely expensive. So, conceptually, we're recreating each partition of the table using a join, then replacing the old partition with the new one.

Apparently (this started before I started my job) it uses up so much memory that we've been doing the JOIN in the filesystem (probably with C or Python).

The only way I can think of to do it entirely within Vertica is to chop each partition up into smaller ROSs, then replace each one, then merge them. But I haven't tried that yet.

(The table itself is on the order of 1 TB in size.)

NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Re: Add column, then efficiently fill it

Post by NorbertKrupa » Wed Mar 05, 2014 6:04 pm

With an UPDATE, you're doing a DELETE and an INSERT. When you say that you're doing a JOIN in the filesystem, have you considered using pre-JOIN projections? This will obviously impact your load performance but you save on query cost as it does not have to optimzie for any type of JOIN.
Checkout vertica.tips for more Vertica resources.

dbmsuser7
Intermediate
Intermediate
Posts: 71
Joined: Tue Jan 28, 2014 3:03 am

Re: Add column, then efficiently fill it

Post by dbmsuser7 » Wed Mar 05, 2014 7:50 pm

I'm not completely privy to the background, since it happened before I joined up, but my understanding is that at the time there was something wrong with pre-join projections. I'm not sure if it was a bug, or whether it didn't mesh well with our use case. I do know that someone on my team talked to Vertica and it was suggested not to use them.

NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Re: Add column, then efficiently fill it

Post by NorbertKrupa » Wed Mar 05, 2014 10:44 pm

Hmm, take a look at this blog post discussing pre-join projections. There may be something in there that justifies not using pre-joins. The other thing that comes to mind is trying:

Code: Select all

INSERT /*+direct*/ INTO tableA (col1, col2, col3)
SELECT a.col1, a.col2, b.col1
FROM tableA a
JOIN tableB b ...
Checkout vertica.tips for more Vertica resources.

dbmsuser7
Intermediate
Intermediate
Posts: 71
Joined: Tue Jan 28, 2014 3:03 am

Re: Add column, then efficiently fill it

Post by dbmsuser7 » Wed Mar 05, 2014 11:24 pm

Thanks---that blog post was really informative!

Post Reply

Return to “Vertica SQL”