Page 1 of 1

analyze_statistics on "Materialize" Step

Posted: Wed Jan 27, 2016 11:44 pm
by Victorgm
Does Vertica run analyze_statistics at the Materialize step in EXPLAIN?
Thanks for your help!

Re: analyze_statistics on "Materialize" Step

Posted: Wed Jan 27, 2016 11:56 pm
by NorbertKrupa
It updates statistics on the data. If there is a materialize step in the plan, it's because the data needed to be materialized. If you don't think the columns should be materialized, do you have the proper encoding on them?

Re: analyze_statistics on "Materialize" Step

Posted: Tue Feb 02, 2016 10:56 pm
by Victorgm
Thanks Norbert.

Perhaps I should get some clarification on a few things before proceeding:
1) What is materialization? Is it fair to call it an intermediate result set, between the original projection and the final result set?

2) What else prompts the materialization step, besides a need to encode or re-encode the columns (as implied by your reply)?

Basic questions, I'm sure, but haven't found any explanation in the docs or anywhere else online, so it would be nice to check my own assumptions here.

Re: analyze_statistics on "Materialize" Step

Posted: Wed Feb 03, 2016 5:09 am
by NorbertKrupa
Materialization means that compressed or encoded data must be decompressed or decoded. An encoding such as RLE may not need materialization.

Here's some reading:

Materialization Strategies in a Column-Oriented DBMS

Re: analyze_statistics on "Materialize" Step

Posted: Wed Feb 03, 2016 2:53 pm
by scutter
Materialization is really just the reading of data from the projection. Query plans may choose to do “early materialization”, which means reading data from the projection during the initial projection scan, or “late materialization” which means deferring the read until a later step of the execution plan. Consider for example a join where many rows are filtered out by the join. It would be wasteful to read all of the columns referenced in the query before the join is evaluated because many of those rows won’t be needed since they are filtered out. So only the join key is read, the join is executed, and after the join, the remaining columns are read from disk for the rows that are emitted from the join.

Vertica doesn’t “update statistics” during query execution. It’s up to you to update statistics after data loads, or using an automated job.

—Sharon

Re: analyze_statistics on "Materialize" Step

Posted: Wed Feb 03, 2016 5:48 pm
by Victorgm
Thanks Norbert - excellent write-up.
Thanks for the nice summary explanation, Sharon.