Need Syntax for Hash And Merge Join
Moderator: NorbertKrupa
Need Syntax for Hash And Merge Join
What is the syntax for Hash And Merge Join in Vertica Database ??
- JimKnicely
- Site Admin
- Posts: 1825
- Joined: Sat Jan 21, 2012 4:58 am
- Contact:
Re: Need Syntax for Hash And Merge Join
Hi, saidaraom!
What do you mean by "syntax for Hash And Merge Join"? I do not believe that there is any specific syntax to force a join to be a HASH or MERGE, but we can influence Vertica to choose one or the other.
Here are some details about each join in Vertica:
MERGE:
For example:
Say I create the following two tables named test1 and test2:
The default projections are sorted by col1 and then col2 for each table:
So if I check the explain plan for a SELECT joining the two tables on col2, I'll see a HASH join:
Now if I create projections for the tables sorted first by col2 and then col1, I can get a MERGE join!
I hope this helps a little!
What do you mean by "syntax for Hash And Merge Join"? I do not believe that there is any specific syntax to force a join to be a HASH or MERGE, but we can influence Vertica to choose one or the other.
Here are some details about each join in Vertica:
MERGE:
- 1. Offers the best performance
2. Optimizer may choose if both inputs are pre-sorted on the join column
3. Vertica can also performs a sort-merge join where the inner input is unsorted but only if the outer input is already sorted on the join keys
4. If the incoming data isn't already sorted correctly, you can facilitate a merge join by adding a projection that is sorted correctly
- 1. Used for joining large data sets
2. No sort requirement
3. Consumes more memory because a hash table is built with the values in the inner table
4. Cost of performing a hash join is low if the entire hash table can fit in memory
5. The optimizer chooses a hash join when projections are not sorted on the join columns
For example:
Say I create the following two tables named test1 and test2:
Code: Select all
dbadmin=> CREATE TABLE test1 (col1 INT, col2 VARCHAR(100)) UNSEGMENTED ALL NODES;
CREATE TABLE
dbadmin=> CREATE TABLE test2 (col1 INT, col2 VARCHAR(100)) UNSEGMENTED ALL NODES;
CREATE TABLE
Code: Select all
dbadmin=> SELECT p.projection_name,
dbadmin-> pc.projection_column_name,
dbadmin-> pc.column_position
dbadmin-> FROM projections p
dbadmin-> JOIN projection_columns pc
dbadmin-> ON pc.projection_name = p.projection_name
dbadmin-> WHERE p.anchor_table_name IN ('test1', 'test2')
dbadmin-> ORDER
dbadmin-> BY p.projection_name,
dbadmin-> pc.column_position;
projection_name | projection_column_name | column_position
-----------------+------------------------+-----------------
test1_node0001 | col1 | 0
test1_node0001 | col2 | 1
test1_node0002 | col1 | 0
test1_node0002 | col2 | 1
test1_node0003 | col1 | 0
test1_node0003 | col2 | 1
test2_node0001 | col1 | 0
test2_node0001 | col2 | 1
test2_node0002 | col1 | 0
test2_node0002 | col2 | 1
test2_node0003 | col1 | 0
test2_node0003 | col2 | 1
(12 rows)
Code: Select all
------------------------------
QUERY PLAN DESCRIPTION:
------------------------------
EXPLAIN SELECT t1.col2, t2.col2
FROM test1 t1
JOIN test2 t2
ON t1.col2 = t2.col2;
Access Path:
+-JOIN HASH [Cost: 2K, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
| Join Cond: (t1.col2 = t2.col2)
| Execute on: Query Initiator
| +-- Outer -> STORAGE ACCESS for t1 [Cost: 649, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
| | Projection: intersect_wh.test1_node0001
| | Materialize: t1.col2
| | Execute on: Query Initiator
| | Runtime Filter: (SIP1(HashJoin): t1.col2)
| +-- Inner -> STORAGE ACCESS for t2 [Cost: 649, Rows: 10K (NO STATISTICS)] (PATH ID: 3)
| | Projection: intersect_wh.test2_node0001
| | Materialize: t2.col2
| | Execute on: Query Initiator
Code: Select all
dbadmin=> CREATE PROJECTION test1_p AS SELECT * FROM test1 ORDER BY col2, col1;
CREATE PROJECTION
dbadmin=> CREATE PROJECTION test2_p AS SELECT * FROM test2 ORDER BY col2, col1;
CREATE PROJECTION
Code: Select all
------------------------------
QUERY PLAN DESCRIPTION:
------------------------------
EXPLAIN SELECT t1.col2, t2.col2
FROM test1 t1
JOIN test2 t2
ON t1.col2 = t2.col2;
Access Path:
+-JOIN MERGEJOIN(inputs presorted) [Cost: 1K, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
| Join Cond: (t1.col2 = t2.col2)
| Execute on: Query Initiator
| +-- Outer -> STORAGE ACCESS for t1 [Cost: 649, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
| | Projection: intersect_wh.test1_p
| | Materialize: t1.col2
| | Execute on: Query Initiator
| | Runtime Filter: (SIP1(MergeJoin): t1.col2)
| +-- Inner -> STORAGE ACCESS for t2 [Cost: 649, Rows: 10K (NO STATISTICS)] (PATH ID: 3)
| | Projection: intersect_wh.test2_p
| | Materialize: t2.col2
| | Execute on: Query Initiator
Jim Knicely
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.