I have an INSERT...SELECT query that creates a huge table. The query is a multiple join of many tables; many of them are quite large.
I don't mind the planner using a hash join in the cases where it's unavoidable or one of the tables are small. But in many of the joins, there's a simple equality join condition on a surrogate key that both tables have as the primary sort index (sorry, can't think of the Vertica-esque phrasing right now). So the merge join is the "natural" thing to do. Yet, sometimes the planner will decide to treat those as hash joins rather than merge joins, even though the tables are so huge that there's no way it can fit into memory. (One outcome of this is that the case where the biggest table in the query has a small fraction of the usual data it has, the query is 30 min slower because the planner decides that a hash join is better in one of the joins. Kind of a "given enough rope to hang itself" situation, I guess.)
I tried some of the suggestions I found on this site and e.g. at http://www.vertica.com/2014/10/22/optim ... erge-join/. Most recently I just replaced every table X (where appropriate) with
Code: Select all
(SELECT * FROM X ORDER BY the_key) AS XX;
TIA,
S