Hi all,
I've found a lot of info in the admin guide that explains how to set up the ordering of columns in a projection, but not too much info about what columns to segment on. It it best just to segment by the primary key or the column with the highest cardinality? I think that is correct.... But if do segment by PK I get a lot of rows in the query_events table that have a description of "Many rows were resegmented during plan execution." I assume hat's bad...
How to segment a projection
Moderator: NorbertKrupa
How to segment a projection
Have a GREAT day!
-
- GURU
- Posts: 527
- Joined: Tue Oct 22, 2013 9:36 pm
- Location: Chicago, IL
- Contact:
Re: How to segment a projection
Technically, the primary key should have the highest cardinality because it has the most unique values. Vertica suggests to use the PK in the documentation. Otherwise, you find the highest cardinality column(s) and use those.
Checkout vertica.tips for more Vertica resources.
Re: How to segment a projection
#norbertk,
Thanks for your response! That's what I figured
Thanks for your response! That's what I figured
Have a GREAT day!
Re: How to segment a projection
Sarah,
It’s a really good thing to understand how projections are optimized - what’s a good sort order, segmentation, etc. But in the end, using Database Designer to design projections is preferred at least as a first step. Have you tried using Designer?
—Sharon
It’s a really good thing to understand how projections are optimized - what’s a good sort order, segmentation, etc. But in the end, using Database Designer to design projections is preferred at least as a first step. Have you tried using Designer?
—Sharon
Sharon Cutter
Vertica Consultant, Zazz Technologies LLC
Vertica Consultant, Zazz Technologies LLC
Re: How to segment a projection
Not a super-easy thing to answer, since there's a lot that goes into this, but in general, the above is all accurate. But even segmenting on a very cardinal PK might not be optimum, depending on what you're trying to accomplish. For example, even having perfect distribution of data through a PK segmentation, you could still get a lot of resegmentation if your query is trying to GROUP BY some logical value, like country_code. If GROUP BY performance is what you're after, you could try creating a projection that's segmented on the logical column instead. That creates local group bys, but beware - it could also create a lot of data skew in your cluster. Tread carefully.