The Vertica Database Forums

Posted: **Fri Nov 15, 2013 4:21 pm**

Is there a difference between DISTINCT and GROUP BY in queries where no aggregate functions are used? Does one perform better on very large data sets?

Posted: **Fri Nov 15, 2013 5:41 pm**

Hi!

[DELETED]

Posted: **Fri Nov 15, 2013 7:57 pm**

Thanks. They seem to generate the same explain plan... However, in the example below, the GROUP BY has a lower associated cost?

Code: Select all

dbadmin=> select count(*) from abc;
   count
-----------
 365496300
(1 row)

Code: Select all

dbadmin=> explain select distinct a, b, c from abc;
                                                                                                                                               QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------
 ------------------------------
 QUERY PLAN DESCRIPTION:
 ------------------------------

 explain select distinct a, b, c from abc;

 Access Path:
 +-GROUPBY HASH (LOCAL RESEGMENT GROUPS) [Cost: 5M, Rows: 10M] (PATH ID: 1)
 |  Group By: abc.a, abc.b, abc.c
 |  Execute on: All Nodes
 | +---> STORAGE ACCESS for abc [Cost: 4M, Rows: 365M] (PATH ID: 2)
 | |      Projection: test.abc_b0
 | |      Materialize: abc.a, abc.b, abc.c
 | |      Execute on: All Nodes

----------

Code: Select all

dbadmin=> explain select a, b, c from abc group by a, b, c;
                                                                                                                                                   QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------
 ------------------------------
 QUERY PLAN DESCRIPTION:
 ------------------------------

 explain select a, b, c from abc group by a, b, c;

 Access Path:
 +-GROUPBY HASH (LOCAL RESEGMENT GROUPS) [Cost: 4M, Rows: 10M] (PATH ID: 1)
 |  Group By: abc.a, abc.b, abc.c
 |  Execute on: All Nodes
 | +---> STORAGE ACCESS for abc [Cost: 3M, Rows: 365M] (PATH ID: 2)
 | |      Projection: test.abc_b0
 | |      Materialize: abc.a, abc.b, abc.c
 | |      Execute on: All Nodes

I wonder why?

The Vertica Database Forums

DISTINCT vs. GROUP BY

DISTINCT vs. GROUP BY

Re: DISTINCT vs. GROUP BY

Re: DISTINCT vs. GROUP BY