Partition Pruning on Date Columns Gotcha

Moderator: NorbertKrupa

Post Reply
sramdram
Newbie
Newbie
Posts: 23
Joined: Fri Jul 12, 2013 6:46 pm

Partition Pruning on Date Columns Gotcha

Post by sramdram » Sat Aug 10, 2013 7:59 pm

If I create a table with a partition like

partition by date_trunc('day',dt) where dt is a timestamp columns...

then queries like

where dt >= date_trunc('day',current_timestamp) will be slower than

dt >= date_trunc('day',current_timestamp::timestamp)

This is also true for postgres...I would expect the Vertica optimizer to be smart about the partition key and be more efficient

User avatar
nnani
Master
Master
Posts: 302
Joined: Fri Apr 13, 2012 6:28 am
Contact:

Re: Partition Pruning on Date Columns Gotcha

Post by nnani » Mon Aug 12, 2013 1:15 pm

Hello sramdram,
While Vertica supports a maximum of 1024 partitions, few, if any, organizations will need to approach that maximum. Fewer partitions are likely to meet your business needs, while also ensuring maximum performance. Many customers, for example, partition their data by month, bringing their partition count to 12. Vertica recommends you keep the number of partitions between 10 and 20 to achieve excellent performance.
So partitioning by date is not recommended in Vertica


Thanks
nnani........
Long way to go

You can check out my blogs at vertica-howto

sramdram
Newbie
Newbie
Posts: 23
Joined: Fri Jul 12, 2013 6:46 pm

Re: Partition Pruning on Date Columns Gotcha

Post by sramdram » Mon Aug 12, 2013 4:30 pm

I dont agree

Try purging a day's worth of 2 billion log entries on a daily basis without partitions.

You have to partition by date to have a smooth way to purge historical data.

User avatar
JimKnicely
Site Admin
Site Admin
Posts: 1806
Joined: Sat Jan 21, 2012 4:58 am
Contact:

Re: Partition Pruning on Date Columns Gotcha

Post by JimKnicely » Tue Aug 13, 2013 1:22 pm

How many historical days do you keep in the table? It's okay to partition by day if you only keep a significantly less number of actual days than 1,024 :)
Jim Knicely

Image

Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.

User avatar
nnani
Master
Master
Posts: 302
Joined: Fri Apr 13, 2012 6:28 am
Contact:

Re: Partition Pruning on Date Columns Gotcha

Post by nnani » Wed Aug 14, 2013 11:28 am

Agreed with Jim,

It all depends on the number of partitions / number of unique dates in your records.
Off course Partition pruning is the fastest way to delete data in Vertica, but you cannot exceed partitions to be more then 1024.

That means you can only have around 3 years of data in your table, So if your data retention policy for the project satisfies with the condition then it is the right choice for you. :)
nnani........
Long way to go

You can check out my blogs at vertica-howto

NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Re: Partition Pruning on Date Columns Gotcha

Post by NorbertKrupa » Wed Apr 02, 2014 5:59 pm

If I'm partitioning by month, it should be a year + month as in the documentation, correct?

Code: Select all

PARTITION BY EXTRACT(year FROM tdate)*100 + EXTRACT(month FROM tdate)
What is the difference between using date_part and EXTRACT?

Code: Select all

PARTITION BY (("#date_part"('year', table.tdate) * 100) + "#date_part"('month', table.tdate))
It seems that they're identical according to the documentation
Checkout vertica.tips for more Vertica resources.

Post Reply

Return to “Vertica Performance Tuning”