We have a set of huge inserts running, broken into "months". Each month takes about 3 hours to run (slow because of the amount of data, plus JOINs).
I tried throwing more RAM at it by increasing the amount in the resource pool (it was previously 40 GB with a planned concurrency of 2, which resulted in allocating 20 GB; I changed it to 60 GB with a planned concurrency of 1). That maybe sped it up from, say, 3.5 hours to 3 hours.
So RAM isn't helping that much, and the SQL already has join spills enabled. So I figured I should try doing more than one "month" in parallel, instead of doing everything serially. But I've got no idea how much disk space the join spills will use, so I don't know how aggressive I can be about running many inserts in parallel.
Join spill disk usage?
Moderator: NorbertKrupa
Re: Join spill disk usage?
Can you share the EXPLAIN output for one of these inserts?
Sharon Cutter
Vertica Consultant, Zazz Technologies LLC
Vertica Consultant, Zazz Technologies LLC
Re: Join spill disk usage?
I'll see if I can post the EXPLAIN tomorrow.
Is this a valid way of checking the spill size?
Is this a valid way of checking the spill size?
Code: Select all
select
node_name,
session_id,
transaction_id,
ROUND(SUM(counter_value)/10^9, 0) AS GB_spilled
from
execution_engine_profiles
where
counter_name = 'bytes spilled' and
user_name = 'the_user' and
counter_value > 0
GROUP BY
node_name, session_id, transaction_id
ORDER BY node_name, GB_spilled DESC;
Re: Join spill disk usage?
BTW I'm not sure the plan is too interesting. One of the tables had no statistics, and when I did stats on it, the planner produced no plan for the query, saying it ran out of memory. And it looked awful; at one point during that query the join spill was up to 0.5 TB (versus ~ 50-100 GB).