Page 1 of 1
Partition Pruning on Date Columns Gotcha
Posted: Sat Aug 10, 2013 7:59 pm
by sramdram
If I create a table with a partition like
partition by date_trunc('day',dt) where dt is a timestamp columns...
then queries like
where dt >= date_trunc('day',current_timestamp) will be slower than
dt >= date_trunc('day',current_timestamp::timestamp)
This is also true for postgres...I would expect the Vertica optimizer to be smart about the partition key and be more efficient
Re: Partition Pruning on Date Columns Gotcha
Posted: Mon Aug 12, 2013 1:15 pm
by nnani
Hello sramdram,
While Vertica supports a maximum of 1024 partitions, few, if any, organizations will need to approach that maximum. Fewer partitions are likely to meet your business needs, while also ensuring maximum performance. Many customers, for example, partition their data by month, bringing their partition count to 12. Vertica recommends you keep the number of partitions between 10 and 20 to achieve excellent performance.
So partitioning by date is not recommended in Vertica
Thanks
Re: Partition Pruning on Date Columns Gotcha
Posted: Mon Aug 12, 2013 4:30 pm
by sramdram
I dont agree
Try purging a day's worth of 2 billion log entries on a daily basis without partitions.
You have to partition by date to have a smooth way to purge historical data.
Re: Partition Pruning on Date Columns Gotcha
Posted: Tue Aug 13, 2013 1:22 pm
by JimKnicely
How many historical days do you keep in the table? It's okay to partition by day if you only keep a significantly less number of actual days than 1,024
Re: Partition Pruning on Date Columns Gotcha
Posted: Wed Aug 14, 2013 11:28 am
by nnani
Agreed with Jim,
It all depends on the number of partitions / number of unique dates in your records.
Off course Partition pruning is the fastest way to delete data in Vertica, but you cannot exceed partitions to be more then 1024.
That means you can only have around 3 years of data in your table, So if your data retention policy for the project satisfies with the condition then it is the right choice for you.
Re: Partition Pruning on Date Columns Gotcha
Posted: Wed Apr 02, 2014 5:59 pm
by NorbertKrupa
If I'm partitioning by month, it should be a year + month as in the
documentation, correct?
Code: Select all
PARTITION BY EXTRACT(year FROM tdate)*100 + EXTRACT(month FROM tdate)
What is the difference between using date_part and EXTRACT?
Code: Select all
PARTITION BY (("#date_part"('year', table.tdate) * 100) + "#date_part"('month', table.tdate))
It seems that they're identical according to the
documentation