DISTINCT vs. GROUP BY
Posted: Fri Nov 15, 2013 4:21 pm
Is there a difference between DISTINCT and GROUP BY in queries where no aggregate functions are used? Does one perform better on very large data sets?
Code: Select all
dbadmin=> select count(*) from abc;
count
-----------
365496300
(1 row)
Code: Select all
dbadmin=> explain select distinct a, b, c from abc;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------
------------------------------
QUERY PLAN DESCRIPTION:
------------------------------
explain select distinct a, b, c from abc;
Access Path:
+-GROUPBY HASH (LOCAL RESEGMENT GROUPS) [Cost: 5M, Rows: 10M] (PATH ID: 1)
| Group By: abc.a, abc.b, abc.c
| Execute on: All Nodes
| +---> STORAGE ACCESS for abc [Cost: 4M, Rows: 365M] (PATH ID: 2)
| | Projection: test.abc_b0
| | Materialize: abc.a, abc.b, abc.c
| | Execute on: All Nodes
Code: Select all
dbadmin=> explain select a, b, c from abc group by a, b, c;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------
------------------------------
QUERY PLAN DESCRIPTION:
------------------------------
explain select a, b, c from abc group by a, b, c;
Access Path:
+-GROUPBY HASH (LOCAL RESEGMENT GROUPS) [Cost: 4M, Rows: 10M] (PATH ID: 1)
| Group By: abc.a, abc.b, abc.c
| Execute on: All Nodes
| +---> STORAGE ACCESS for abc [Cost: 3M, Rows: 365M] (PATH ID: 2)
| | Projection: test.abc_b0
| | Materialize: abc.a, abc.b, abc.c
| | Execute on: All Nodes