Vertica partions and partitioning questions

Moderator: NorbertKrupa

Post Reply
amine
Newbie
Newbie
Posts: 13
Joined: Tue Jul 16, 2013 3:18 pm

Vertica partions and partitioning questions

Post by amine » Mon Sep 29, 2014 8:15 pm

I've been using Vertica 6.1.3 in a 3 nodes cluster for a while and time has come to partition the data to get the most out of it.
Reading through the documentation and the nature of the data that I am dealing with, it looks like it's best to partition data by year-month for the last 12 months, and have a script that merges the ROS containers for the partitions that are before 12 months ago. (unless there's a better way to do it within Vertica itself)
Although everything seems to be easy to find my way in and out, I still have some questions about some statements in the online documentation:

1/ from the doc: https://my.vertica.com/docs/6.1.x/HTML/ ... #10444.htm
While HP Vertica supports a maximum of 1024 partitions, few, if any, organizations will need to approach that maximum.at maximum.
I hope that this meant to say "Vertica supports a max of 1024 ROS containers (not partitions)", because the way I am going to (dynamically) add partitions as data comes in, then merge at the tail would hit the 1024 partitions at some point, but should be within 13 ROS containers (12 for each prev. 12 months, and 1 for the rest)

2/ from the doc: https://my.vertica.com/docs/6.1.x/HTML/ ... #14361.htm
By default, the Tuple Mover assumes that all loads and updates for partitioned tables are going to the same active partition. For example, if a table is partitioned by month, the Tuple Mover expects that after the start of a new month, no data is loaded into the partition for the prior month.
Obviously, this is assuming that the TM.ActivePartitionCount is set to 1.
My question regarding this is; what happens if I have 1 active partition set, but I am inserting data for more than 1 partition? (I guess I am trying to understand what would happen if the above TM expectation isn't met)

Thank you for your help!

NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Re: Vertica partions and partitioning questions

Post by NorbertKrupa » Mon Sep 29, 2014 9:20 pm

That's not correct; 1 partition does not mean 1 ROS container (you can have multiple projections).

The target range for partitions should be around 10-20. Of course this isn't really realistic. If you have 24 months of data, partitioning on month (year + month) makes the most sense. However, depending on the data retention and other use cases, you may want to use weekly partitioning which would end up being 104 partitions for 24 months of data.

Vertica will automatically reorganize partitions if REORGANIZE is passed with the ALTER TABLE statement and take care of data added to previous partitions. However, the latter should be avoided as the entire ROS container will need to be rebuilt.
Checkout vertica.tips for more Vertica resources.

scutter
Master
Master
Posts: 302
Joined: Tue Aug 07, 2012 2:15 am

Re: Vertica partions and partitioning questions

Post by scutter » Mon Sep 29, 2014 11:48 pm

Don’t get into the habit of using MERGE_PARTITIONS(). It’s deprecated in 7.0 and retired in 7.1.

You can use a partitioning expression that groups together partitions that are past some chosen date (use a CASE statement for example). It’s common to want daily partitioning for recent data - so that a full day’s loads can be stripped off if needed by dropping the partition or moving it to another table, rather than deleting data. And especially with the 7.1 partitioning swapping feature where you might want to work on a partition and swap it in/out. But daily partitioning is often too granular if a large amount of data is being retained. Change the partitioning expression as needed and alter the partitoning as needed.

Btw - in case you’re unclear on how partitions work in Vertica - the partition count is number of actual partitions - so if you have multiple partition keys in a single “partition” or ROS, that’s one partition.

Tuple mover and active partitions. The issue there is simply the tuple mover’s desire to keep partitions basically at a single ROS. It doesn’t do that as aggressively these days as it has in the past, but if you do multiple large loads across lots of partitions, the TM is going to work on merging the “inactive” partitions back to a single ROS. So it will result in a lot of i/o if you do this often. It’s worth keeping this in mind when back-filling or bulk loading a table - best to load by partition or range of partitions, rather than loading random sets of data that spread out across many partitions.

—Sharon
Sharon Cutter
Vertica Consultant, Zazz Technologies LLC

amine
Newbie
Newbie
Posts: 13
Joined: Tue Jul 16, 2013 3:18 pm

Re: Vertica partions and partitioning questions

Post by amine » Tue Sep 30, 2014 3:52 pm

@norbertk
Thank you for your input. And I even think reaching 1024 year+month will take a while, and by then, the "old" data can be moved out of the partitions to a separate tables.

@scutter
Thanks for the advice, and will definitely not use the MERGE_PARTITIONS() knowing it has been deprecated (and we'll be upgrading soon).
but if you do multiple large loads across lots of partitions, the TM is going to work on merging the “inactive” partitions back to a single ROS
Correct me if I am wrong, but shouldn't I be able to control how often to MoveOut and MergeOut by setting those values in the TM configuration? (this is obviously assuming that data loaded to the WOS is small enough to be there until the MoveOut is called)

Thank you again for your replies!

scutter
Master
Master
Posts: 302
Joined: Tue Aug 07, 2012 2:15 am

Re: Vertica partions and partitioning questions

Post by scutter » Tue Sep 30, 2014 11:47 pm

You can configure how often the TM starts its moveout and mergeout work. I was referring to larger loads that are either DIRECT or spill to ROS. Large loads will tend to take a while, so you could definitely be loading through multiple TM mergeout cycles and make the TM want to merge the same partition multiple times.

—Sharon
Sharon Cutter
Vertica Consultant, Zazz Technologies LLC

NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Re: Vertica partions and partitioning questions

Post by NorbertKrupa » Thu Oct 02, 2014 1:49 am

Just wanted to throw this Community post into the mix as it has some relevant conversations.
Checkout vertica.tips for more Vertica resources.

Post Reply

Return to “Vertica Database Administration”