Tell planner to do merge join

dbmsuser7 · Post by **dbmsuser7** » Thu Mar 12, 2015 6:53 pm

What methods are there for telling the planner to do a merge join?

I have an INSERT...SELECT query that creates a huge table. The query is a multiple join of many tables; many of them are quite large.

I don't mind the planner using a hash join in the cases where it's unavoidable or one of the tables are small. But in many of the joins, there's a simple equality join condition on a surrogate key that both tables have as the primary sort index (sorry, can't think of the Vertica-esque phrasing right now). So the merge join is the "natural" thing to do. Yet, sometimes the planner will decide to treat those as hash joins rather than merge joins, even though the tables are so huge that there's no way it can fit into memory. (One outcome of this is that the case where the biggest table in the query has a small fraction of the usual data it has, the query is 30 min slower because the planner decides that a hash join is better in one of the joins. Kind of a "given enough rope to hang itself" situation, I guess.)

I tried some of the suggestions I found on this site and e.g. at http://www.vertica.com/2014/10/22/optim ... erge-join/. Most recently I just replaced every table X (where appropriate) with

Code: Select all

(SELECT * FROM X ORDER BY the_key) AS XX;

and it did the opposite of what I wanted (instead of eliminating the possibility of a hash join, it replaced all the instances of merge joins with hash joins).

TIA,

S

scutter · Post by **scutter** » Thu Mar 12, 2015 10:07 pm

It’s hard to comment on this without seeing EXPLAIN output and projection definitions. Are you able to share those?

You said that there are multiple joins - it sounds like the joins are on a variety of keys? Do you always get one merge join?

—Sharon

NorbertKrupa · Post by **NorbertKrupa** » Fri Mar 13, 2015 2:18 pm

Like Sharon said, we need to see the EXPLAIN at the least. If Vertica can't perform the merge join, then it probably just can't fit it all into memory. Optimally, you want the sort to be done in the projection.

dbmsuser7 · Post by **dbmsuser7** » Fri Mar 13, 2015 2:28 pm

I'm happy to share the plan, I can just change table names.

Problem is that one of the tables in the plan has no statistics. And I found that if I put stats on it, the plan just explodes---IIRC the planner couldn't even finish planning.

Nonetheless I'll post one of the plans in the next hour or so.

Thanks for all the help,

S

scutter · Post by **scutter** » Fri Mar 13, 2015 2:47 pm

Stats are a prereq for query planning. If you’ve hit a scenario where the database can’t plan a query with stats, then you should open a support ticket.

—Sharon

dbmsuser7 · Post by **dbmsuser7** » Fri Mar 13, 2015 7:07 pm

OK.

Re posting EXPLAIN here, I'll have to get to it later today or even early next week. I don't know of a good way to anonymize the output other than doing it by hand, and that's taking a long time.

Tell planner to do merge join

Tell planner to do merge join

Re: Tell planner to do merge join

Re: Tell planner to do merge join

Re: Tell planner to do merge join

Re: Tell planner to do merge join

Re: Tell planner to do merge join