If I create a table with a partition like
partition by date_trunc('day',dt) where dt is a timestamp columns...
then queries like
where dt >= date_trunc('day',current_timestamp) will be slower than
dt >= date_trunc('day',current_timestamp::timestamp)
This is also true for postgres...I would expect the Vertica optimizer to be smart about the partition key and be more efficient
Partition Pruning on Date Columns Gotcha
Moderator: NorbertKrupa
Re: Partition Pruning on Date Columns Gotcha
Hello sramdram,
Thanks
So partitioning by date is not recommended in VerticaWhile Vertica supports a maximum of 1024 partitions, few, if any, organizations will need to approach that maximum. Fewer partitions are likely to meet your business needs, while also ensuring maximum performance. Many customers, for example, partition their data by month, bringing their partition count to 12. Vertica recommends you keep the number of partitions between 10 and 20 to achieve excellent performance.
Thanks
Re: Partition Pruning on Date Columns Gotcha
I dont agree
Try purging a day's worth of 2 billion log entries on a daily basis without partitions.
You have to partition by date to have a smooth way to purge historical data.
Try purging a day's worth of 2 billion log entries on a daily basis without partitions.
You have to partition by date to have a smooth way to purge historical data.
- JimKnicely
- Site Admin
- Posts: 1825
- Joined: Sat Jan 21, 2012 4:58 am
- Contact:
Re: Partition Pruning on Date Columns Gotcha
How many historical days do you keep in the table? It's okay to partition by day if you only keep a significantly less number of actual days than 1,024
Jim Knicely
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Re: Partition Pruning on Date Columns Gotcha
Agreed with Jim,
It all depends on the number of partitions / number of unique dates in your records.
Off course Partition pruning is the fastest way to delete data in Vertica, but you cannot exceed partitions to be more then 1024.
That means you can only have around 3 years of data in your table, So if your data retention policy for the project satisfies with the condition then it is the right choice for you.
It all depends on the number of partitions / number of unique dates in your records.
Off course Partition pruning is the fastest way to delete data in Vertica, but you cannot exceed partitions to be more then 1024.
That means you can only have around 3 years of data in your table, So if your data retention policy for the project satisfies with the condition then it is the right choice for you.
-
- GURU
- Posts: 527
- Joined: Tue Oct 22, 2013 9:36 pm
- Location: Chicago, IL
- Contact:
Re: Partition Pruning on Date Columns Gotcha
If I'm partitioning by month, it should be a year + month as in the documentation, correct?
What is the difference between using date_part and EXTRACT?
It seems that they're identical according to the documentation
Code: Select all
PARTITION BY EXTRACT(year FROM tdate)*100 + EXTRACT(month FROM tdate)
Code: Select all
PARTITION BY (("#date_part"('year', table.tdate) * 100) + "#date_part"('month', table.tdate))
Checkout vertica.tips for more Vertica resources.