analyze_statistics on "Materialize" Step

Moderator: NorbertKrupa

Post Reply
Victorgm
Beginner
Beginner
Posts: 25
Joined: Fri Jul 17, 2015 2:22 pm

analyze_statistics on "Materialize" Step

Post by Victorgm » Wed Jan 27, 2016 11:44 pm

Does Vertica run analyze_statistics at the Materialize step in EXPLAIN?
Thanks for your help!

NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Re: analyze_statistics on "Materialize" Step

Post by NorbertKrupa » Wed Jan 27, 2016 11:56 pm

It updates statistics on the data. If there is a materialize step in the plan, it's because the data needed to be materialized. If you don't think the columns should be materialized, do you have the proper encoding on them?
Checkout vertica.tips for more Vertica resources.

Victorgm
Beginner
Beginner
Posts: 25
Joined: Fri Jul 17, 2015 2:22 pm

Re: analyze_statistics on "Materialize" Step

Post by Victorgm » Tue Feb 02, 2016 10:56 pm

Thanks Norbert.

Perhaps I should get some clarification on a few things before proceeding:
1) What is materialization? Is it fair to call it an intermediate result set, between the original projection and the final result set?

2) What else prompts the materialization step, besides a need to encode or re-encode the columns (as implied by your reply)?

Basic questions, I'm sure, but haven't found any explanation in the docs or anywhere else online, so it would be nice to check my own assumptions here.

NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Re: analyze_statistics on "Materialize" Step

Post by NorbertKrupa » Wed Feb 03, 2016 5:09 am

Materialization means that compressed or encoded data must be decompressed or decoded. An encoding such as RLE may not need materialization.

Here's some reading:

Materialization Strategies in a Column-Oriented DBMS
Checkout vertica.tips for more Vertica resources.

scutter
Master
Master
Posts: 302
Joined: Tue Aug 07, 2012 2:15 am

Re: analyze_statistics on "Materialize" Step

Post by scutter » Wed Feb 03, 2016 2:53 pm

Materialization is really just the reading of data from the projection. Query plans may choose to do “early materialization”, which means reading data from the projection during the initial projection scan, or “late materialization” which means deferring the read until a later step of the execution plan. Consider for example a join where many rows are filtered out by the join. It would be wasteful to read all of the columns referenced in the query before the join is evaluated because many of those rows won’t be needed since they are filtered out. So only the join key is read, the join is executed, and after the join, the remaining columns are read from disk for the rows that are emitted from the join.

Vertica doesn’t “update statistics” during query execution. It’s up to you to update statistics after data loads, or using an automated job.

—Sharon
Sharon Cutter
Vertica Consultant, Zazz Technologies LLC

Victorgm
Beginner
Beginner
Posts: 25
Joined: Fri Jul 17, 2015 2:22 pm

Re: analyze_statistics on "Materialize" Step

Post by Victorgm » Wed Feb 03, 2016 5:48 pm

Thanks Norbert - excellent write-up.
Thanks for the nice summary explanation, Sharon.

Post Reply

Return to “Vertica Performance Tuning”