Tell planner to do merge join

Moderator: NorbertKrupa

Post Reply
dbmsuser7
Intermediate
Intermediate
Posts: 71
Joined: Tue Jan 28, 2014 3:03 am

Tell planner to do merge join

Post by dbmsuser7 » Thu Mar 12, 2015 6:53 pm

What methods are there for telling the planner to do a merge join?

I have an INSERT...SELECT query that creates a huge table. The query is a multiple join of many tables; many of them are quite large.

I don't mind the planner using a hash join in the cases where it's unavoidable or one of the tables are small. But in many of the joins, there's a simple equality join condition on a surrogate key that both tables have as the primary sort index (sorry, can't think of the Vertica-esque phrasing right now). So the merge join is the "natural" thing to do. Yet, sometimes the planner will decide to treat those as hash joins rather than merge joins, even though the tables are so huge that there's no way it can fit into memory. (One outcome of this is that the case where the biggest table in the query has a small fraction of the usual data it has, the query is 30 min slower because the planner decides that a hash join is better in one of the joins. Kind of a "given enough rope to hang itself" situation, I guess.)

I tried some of the suggestions I found on this site and e.g. at http://www.vertica.com/2014/10/22/optim ... erge-join/. Most recently I just replaced every table X (where appropriate) with

Code: Select all

(SELECT * FROM X ORDER BY the_key) AS XX;
and it did the opposite of what I wanted (instead of eliminating the possibility of a hash join, it replaced all the instances of merge joins with hash joins).

TIA,

S

scutter
Master
Master
Posts: 302
Joined: Tue Aug 07, 2012 2:15 am

Re: Tell planner to do merge join

Post by scutter » Thu Mar 12, 2015 10:07 pm

It’s hard to comment on this without seeing EXPLAIN output and projection definitions. Are you able to share those?

You said that there are multiple joins - it sounds like the joins are on a variety of keys? Do you always get one merge join?

—Sharon
Sharon Cutter
Vertica Consultant, Zazz Technologies LLC

NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Re: Tell planner to do merge join

Post by NorbertKrupa » Fri Mar 13, 2015 2:18 pm

Like Sharon said, we need to see the EXPLAIN at the least. If Vertica can't perform the merge join, then it probably just can't fit it all into memory. Optimally, you want the sort to be done in the projection.
Checkout vertica.tips for more Vertica resources.

dbmsuser7
Intermediate
Intermediate
Posts: 71
Joined: Tue Jan 28, 2014 3:03 am

Re: Tell planner to do merge join

Post by dbmsuser7 » Fri Mar 13, 2015 2:28 pm

I'm happy to share the plan, I can just change table names.

Problem is that one of the tables in the plan has no statistics. And I found that if I put stats on it, the plan just explodes---IIRC the planner couldn't even finish planning.

Nonetheless I'll post one of the plans in the next hour or so.

Thanks for all the help,

S

scutter
Master
Master
Posts: 302
Joined: Tue Aug 07, 2012 2:15 am

Re: Tell planner to do merge join

Post by scutter » Fri Mar 13, 2015 2:47 pm

Stats are a prereq for query planning. If you’ve hit a scenario where the database can’t plan a query with stats, then you should open a support ticket.

—Sharon
Sharon Cutter
Vertica Consultant, Zazz Technologies LLC

dbmsuser7
Intermediate
Intermediate
Posts: 71
Joined: Tue Jan 28, 2014 3:03 am

Re: Tell planner to do merge join

Post by dbmsuser7 » Fri Mar 13, 2015 7:07 pm

OK.

Re posting EXPLAIN here, I'll have to get to it later today or even early next week. I don't know of a good way to anonymize the output other than doing it by hand, and that's taking a long time.

Post Reply

Return to “Vertica Database Administration”