Hi all,
Does anyone have any tips or best ways to handle a COUNT DISTINCT on a very large data set (> 1 million rows in a fact table). I'm seeing my queries take > 30 seconds with the COUNT DISTINCT and about 7 seconds with a simple COUNT.
Thanks!
Performance tuning a COUNT DISTINCT
Moderator: NorbertKrupa
Performance tuning a COUNT DISTINCT
THANKS - BECKSTER
-
- GURU
- Posts: 527
- Joined: Tue Oct 22, 2013 9:36 pm
- Location: Chicago, IL
- Contact:
Re: Performance tuning a COUNT DISTINCT
If you're on 7.0+, you may want to look at Optimizing COUNT (DISTINCT) by Calculating Approximate Counts.
Checkout vertica.tips for more Vertica resources.
Re: Performance tuning a COUNT DISTINCT
There is the good article about count distinct on the link below.
Avoiding the OLAP Cliff for Count Distinct Queries in Vertica
http://www.vertica.com/2014/03/05/avoid ... n-vertica/
Avoiding the OLAP Cliff for Count Distinct Queries in Vertica
http://www.vertica.com/2014/03/05/avoid ... n-vertica/
Itipong Chewinpipat (Bay)
DBA Specialist (Vertica/Oracle)
DBA Specialist (Vertica/Oracle)
-
- GURU
- Posts: 527
- Joined: Tue Oct 22, 2013 9:36 pm
- Location: Chicago, IL
- Contact:
Re: Performance tuning a COUNT DISTINCT
That article eludes to using an approximate count distinct.
Checkout vertica.tips for more Vertica resources.
Re: Performance tuning a COUNT DISTINCT
Thanks, guys.
I tried the approximate count distinct function and the query never returns! Well, I let it run for about two minutes and just canceled it. The COUNT DISTINCT is faster in my database. Is that weird?
I tried the approximate count distinct function and the query never returns! Well, I let it run for about two minutes and just canceled it. The COUNT DISTINCT is faster in my database. Is that weird?
THANKS - BECKSTER
-
- GURU
- Posts: 527
- Joined: Tue Oct 22, 2013 9:36 pm
- Location: Chicago, IL
- Contact:
Re: Performance tuning a COUNT DISTINCT
What does your query look like?
Checkout vertica.tips for more Vertica resources.
Re: Performance tuning a COUNT DISTINCT
Anyone know what these parameters do?
Code: Select all
dbadmin=> select parameter_name, description from configuration_parameters where parameter_name ilike '%distinct%';
parameter_name | description
--------------------------------+-------------------------------------------------------------------------------------------------------------------------
DBDCountDistinctSampleRowCount | Number of rows to be sampled for count distinct analysis during DBD design. Default: 0 Use DBDCountDistinctSampleRowPct
DBDCountDistinctSampleRowPct | Percentage of rows to be sampled for count distinct analysis during DBD design. Default: 100
(2 rows)
THANKS - BECKSTER