The Vertica Database Forums

Posted: **Wed Sep 17, 2014 1:28 am**

Hi all,

Does anyone have any tips or best ways to handle a COUNT DISTINCT on a very large data set (> 1 million rows in a fact table). I'm seeing my queries take > 30 seconds with the COUNT DISTINCT and about 7 seconds with a simple COUNT.

Thanks!

Posted: **Wed Sep 17, 2014 1:20 pm**

If you're on 7.0+, you may want to look at Optimizing COUNT (DISTINCT) by Calculating Approximate Counts.

Posted: **Wed Sep 17, 2014 1:42 pm**

There is the good article about count distinct on the link below.

Avoiding the OLAP Cliff for Count Distinct Queries in Vertica
http://www.vertica.com/2014/03/05/avoid ... n-vertica/

Posted: **Wed Sep 17, 2014 1:53 pm**

That article eludes to using an approximate count distinct.

Posted: **Wed Sep 17, 2014 2:13 pm**

Thanks, guys.

I tried the approximate count distinct function and the query never returns! Well, I let it run for about two minutes and just canceled it. The COUNT DISTINCT is faster in my database. Is that weird?

Posted: **Wed Sep 17, 2014 2:47 pm**

What does your query look like?

Posted: **Tue Sep 30, 2014 3:47 pm**

Anyone know what these parameters do?

Code: Select all

dbadmin=> select parameter_name, description from configuration_parameters where parameter_name ilike '%distinct%';
         parameter_name         |                                                       description
--------------------------------+-------------------------------------------------------------------------------------------------------------------------
 DBDCountDistinctSampleRowCount | Number of rows to be sampled for count distinct analysis during DBD design. Default: 0 Use DBDCountDistinctSampleRowPct
 DBDCountDistinctSampleRowPct   | Percentage of rows to be sampled for count distinct analysis during DBD design. Default: 100
(2 rows)

The Vertica Database Forums

Performance tuning a COUNT DISTINCT

Performance tuning a COUNT DISTINCT

Re: Performance tuning a COUNT DISTINCT

Re: Performance tuning a COUNT DISTINCT

Re: Performance tuning a COUNT DISTINCT

Re: Performance tuning a COUNT DISTINCT

Re: Performance tuning a COUNT DISTINCT

Re: Performance tuning a COUNT DISTINCT