Join spill disk usage?

Moderator: NorbertKrupa

Post Reply
dbmsuser7
Intermediate
Intermediate
Posts: 71
Joined: Tue Jan 28, 2014 3:03 am

Join spill disk usage?

Post by dbmsuser7 » Tue Feb 24, 2015 2:59 pm

We have a set of huge inserts running, broken into "months". Each month takes about 3 hours to run (slow because of the amount of data, plus JOINs).

I tried throwing more RAM at it by increasing the amount in the resource pool (it was previously 40 GB with a planned concurrency of 2, which resulted in allocating 20 GB; I changed it to 60 GB with a planned concurrency of 1). That maybe sped it up from, say, 3.5 hours to 3 hours.

So RAM isn't helping that much, and the SQL already has join spills enabled. So I figured I should try doing more than one "month" in parallel, instead of doing everything serially. But I've got no idea how much disk space the join spills will use, so I don't know how aggressive I can be about running many inserts in parallel.

scutter
Master
Master
Posts: 302
Joined: Tue Aug 07, 2012 2:15 am

Re: Join spill disk usage?

Post by scutter » Tue Feb 24, 2015 5:51 pm

Can you share the EXPLAIN output for one of these inserts?
Sharon Cutter
Vertica Consultant, Zazz Technologies LLC

dbmsuser7
Intermediate
Intermediate
Posts: 71
Joined: Tue Jan 28, 2014 3:03 am

Re: Join spill disk usage?

Post by dbmsuser7 » Wed Feb 25, 2015 3:12 am

I'll see if I can post the EXPLAIN tomorrow.

Is this a valid way of checking the spill size?

Code: Select all

select
  node_name,
  session_id,
  transaction_id,
  ROUND(SUM(counter_value)/10^9, 0) AS GB_spilled
from
  execution_engine_profiles
where
  counter_name = 'bytes spilled' and
  user_name = 'the_user' and
  counter_value > 0
GROUP BY
  node_name, session_id, transaction_id
ORDER BY node_name, GB_spilled DESC;

dbmsuser7
Intermediate
Intermediate
Posts: 71
Joined: Tue Jan 28, 2014 3:03 am

Re: Join spill disk usage?

Post by dbmsuser7 » Wed Feb 25, 2015 3:14 am

BTW I'm not sure the plan is too interesting. One of the tables had no statistics, and when I did stats on it, the planner produced no plan for the query, saying it ran out of memory. And it looked awful; at one point during that query the join spill was up to 0.5 TB (versus ~ 50-100 GB).

Post Reply

Return to “Vertica Database Administration”