Page 1 of 1

update statistics

Posted: Tue Jun 20, 2017 9:09 pm
by sreediff
As per ... ection.htm

After loading a table with significant volume >50 % the recommendation is to run analyze_statistics which takes 10 pct sample.
Is this enough or should we run analyze_histogram . What should be ideal sample percent (1-100) ? Also does vertica automatically run analyze_statistics after purge /delete or on any other trigger?

Re: update statistics

Posted: Thu Jun 29, 2017 2:56 pm
by JimKnicely

Vertica does not compute a table's complete set of statistics automatically, as that would negatively affect the database performance. However Vertica does regularly collect some statistics incrementally, like table row counts every 60 seconds (see AnalyzeRowCountInterval parameter).

dbadmin=> select parameter_name, default_value, description from configuration_parameters where parameter_name = 'AnalyzeRowCountInterval';
parameter_name | default_value | description
AnalyzeRowCountInterval | 60 | Interval between Tuple Mover row count statistics updates (seconds)
(1 row)

Sampling 10% of the rows from a table is a good trade off between performance vs accuracy. I'm an ex-Oracle DBA and we used to relay on Oracle's automatic sampling size of 5-20%. At 20% the stats were pretty darn close to computing on 100%. Vertica's default of 10% is fine. If you want a higher level of statistic accuracy for important columns (i.e. those used in WHERE clause predicates) then you can run ANALYZE_STATISTICS on just those columns.

Check out the Workload Analyzer which can help make tuning recommendations (i.e. you need to run ANALYZE_STATISTICS on a table because its "Statistics are stale (no histogram or predicate falls outside histogram).") ... itions.htm