Segmentation on Column with NULL values

Moderator: NorbertKrupa

Post Reply
sarah
Intermediate
Intermediate
Posts: 77
Joined: Mon Aug 27, 2012 1:34 pm

Segmentation on Column with NULL values

Post by sarah » Fri Apr 11, 2014 1:46 pm

Hello,

I have a huge table that has a column with a high cardinality, but it also has many (1 million +) NULL values.

Does anyone know if there is a performance hit if I would segment on a column like that, i.e., where there a many NULLs? Do all of the NULLs go to one node?

Thanks!
Have a GREAT day!

NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Re: Segmentation on Column with NULL values

Post by NorbertKrupa » Fri Apr 11, 2014 1:57 pm

Well, in general it's a bad idea as it could lead to data skew. If that happens, data won't be evenly distributed and data won't be able to be accessed as fast since one node may have more data than another. I would try to find the next column or columns that have a high cardinality without NULLs.
Checkout vertica.tips for more Vertica resources.

scutter
Master
Master
Posts: 302
Joined: Tue Aug 07, 2012 2:15 am

Re: Segmentation on Column with NULL values

Post by scutter » Fri Apr 11, 2014 2:05 pm

And sometimes accepting that skew is ok. For example, to get a merge join instead of a hash join, or to avoid a lot of dynamic resegmentation of data during query execution. It all depends on the use case.

—Sharon
Sharon Cutter
Vertica Consultant, Zazz Technologies LLC

Post Reply

Return to “New to Vertica Database Administration”