Hello,
I have a huge table that has a column with a high cardinality, but it also has many (1 million +) NULL values.
Does anyone know if there is a performance hit if I would segment on a column like that, i.e., where there a many NULLs? Do all of the NULLs go to one node?
Thanks!
Segmentation on Column with NULL values
Moderator: NorbertKrupa
Segmentation on Column with NULL values
Have a GREAT day!
-
- GURU
- Posts: 527
- Joined: Tue Oct 22, 2013 9:36 pm
- Location: Chicago, IL
- Contact:
Re: Segmentation on Column with NULL values
Well, in general it's a bad idea as it could lead to data skew. If that happens, data won't be evenly distributed and data won't be able to be accessed as fast since one node may have more data than another. I would try to find the next column or columns that have a high cardinality without NULLs.
Checkout vertica.tips for more Vertica resources.
Re: Segmentation on Column with NULL values
And sometimes accepting that skew is ok. For example, to get a merge join instead of a hash join, or to avoid a lot of dynamic resegmentation of data during query execution. It all depends on the use case.
—Sharon
—Sharon
Sharon Cutter
Vertica Consultant, Zazz Technologies LLC
Vertica Consultant, Zazz Technologies LLC