Page 1 of 1
Performance tuning a COUNT DISTINCT
Posted: Wed Sep 17, 2014 1:28 am
by becky
Hi all,
Does anyone have any tips or best ways to handle a COUNT DISTINCT on a very large data set (> 1 million rows in a fact table). I'm seeing my queries take > 30 seconds with the COUNT DISTINCT and about 7 seconds with a simple COUNT.
Thanks!
Re: Performance tuning a COUNT DISTINCT
Posted: Wed Sep 17, 2014 1:20 pm
by NorbertKrupa
Re: Performance tuning a COUNT DISTINCT
Posted: Wed Sep 17, 2014 1:42 pm
by BoMBaY
There is the good article about count distinct on the link below.
Avoiding the OLAP Cliff for Count Distinct Queries in Vertica
http://www.vertica.com/2014/03/05/avoid ... n-vertica/
Re: Performance tuning a COUNT DISTINCT
Posted: Wed Sep 17, 2014 1:53 pm
by NorbertKrupa
That article eludes to using an approximate count distinct.
Re: Performance tuning a COUNT DISTINCT
Posted: Wed Sep 17, 2014 2:13 pm
by becky
Thanks, guys.
I tried the approximate count distinct function and the query never returns! Well, I let it run for about two minutes and just canceled it. The COUNT DISTINCT is faster in my database. Is that weird?
Re: Performance tuning a COUNT DISTINCT
Posted: Wed Sep 17, 2014 2:47 pm
by NorbertKrupa
What does your query look like?
Re: Performance tuning a COUNT DISTINCT
Posted: Tue Sep 30, 2014 3:47 pm
by becky
Anyone know what these parameters do?
Code: Select all
dbadmin=> select parameter_name, description from configuration_parameters where parameter_name ilike '%distinct%';
parameter_name | description
--------------------------------+-------------------------------------------------------------------------------------------------------------------------
DBDCountDistinctSampleRowCount | Number of rows to be sampled for count distinct analysis during DBD design. Default: 0 Use DBDCountDistinctSampleRowPct
DBDCountDistinctSampleRowPct | Percentage of rows to be sampled for count distinct analysis during DBD design. Default: 100
(2 rows)