DISTINCT vs. GROUP BY

Moderator: NorbertKrupa

Post Reply
JeffSatler
Newbie
Newbie
Posts: 21
Joined: Mon Feb 13, 2012 9:44 pm

DISTINCT vs. GROUP BY

Post by JeffSatler » Fri Nov 15, 2013 4:21 pm

Is there a difference between DISTINCT and GROUP BY in queries where no aggregate functions are used? Does one perform better on very large data sets?

id10t
GURU
GURU
Posts: 732
Joined: Mon Apr 16, 2012 2:44 pm

Re: DISTINCT vs. GROUP BY

Post by id10t » Fri Nov 15, 2013 5:41 pm

Hi!

[DELETED]
Last edited by id10t on Sat May 09, 2015 3:18 pm, edited 1 time in total.

JeffSatler
Newbie
Newbie
Posts: 21
Joined: Mon Feb 13, 2012 9:44 pm

Re: DISTINCT vs. GROUP BY

Post by JeffSatler » Fri Nov 15, 2013 7:57 pm

Thanks. They seem to generate the same explain plan... However, in the example below, the GROUP BY has a lower associated cost?

Code: Select all

dbadmin=> select count(*) from abc;
   count
-----------
 365496300
(1 row)

Code: Select all

dbadmin=> explain select distinct a, b, c from abc;
                                                                                                                                               QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------
 ------------------------------
 QUERY PLAN DESCRIPTION:
 ------------------------------

 explain select distinct a, b, c from abc;

 Access Path:
 +-GROUPBY HASH (LOCAL RESEGMENT GROUPS) [Cost: 5M, Rows: 10M] (PATH ID: 1)
 |  Group By: abc.a, abc.b, abc.c
 |  Execute on: All Nodes
 | +---> STORAGE ACCESS for abc [Cost: 4M, Rows: 365M] (PATH ID: 2)
 | |      Projection: test.abc_b0
 | |      Materialize: abc.a, abc.b, abc.c
 | |      Execute on: All Nodes
----------

Code: Select all

dbadmin=> explain select a, b, c from abc group by a, b, c;
                                                                                                                                                   QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------
 ------------------------------
 QUERY PLAN DESCRIPTION:
 ------------------------------

 explain select a, b, c from abc group by a, b, c;

 Access Path:
 +-GROUPBY HASH (LOCAL RESEGMENT GROUPS) [Cost: 4M, Rows: 10M] (PATH ID: 1)
 |  Group By: abc.a, abc.b, abc.c
 |  Execute on: All Nodes
 | +---> STORAGE ACCESS for abc [Cost: 3M, Rows: 365M] (PATH ID: 2)
 | |      Projection: test.abc_b0
 | |      Materialize: abc.a, abc.b, abc.c
 | |      Execute on: All Nodes
I wonder why?

Post Reply

Return to “Vertica SQL”