We need your help / thought on one scenario.
We have a dimension table with 50M record on customer data and we forecast this table to grow up to 100Million records soon.
This table have frequent updates as and when we get data from multiple sources. Table has unique records.
Daily updates are happening on 50K records but it will scan full table.We understand that in vertica, updates are delete and inserts.
We have an 18 Node cluster and KSafe=1. Table is segmented across this 18 Nodes and have 2 super projections.
We are noticing, many a times UPDATES on this table is time consuming and few updates ends up with below error messages
Insufficient resources to execute plan on pool
Please help us in understanding how we can solve this issue?
a. Is that data segmented among all the 18 nodes is causing such issues and do we need to think of segment only on 4 to 5 nodes to see better performance and avoid errors ?
b. Increase the k-safe value will help?
c. Also, we are using /*DIRECT*/for all DML statements. How it is impacting performance
Thanks in Advance
2 posts • Page 1 of 1
I wouldn't recommend creating custom segmentation. This is most likely not causing issues.a. Is that data segmented among all the 18 nodes is causing such issues and do we need to think of segment only on 4 to 5 nodes to see better performance and avoid errors ?
This will only create additional projection containers that will need to be deleted and inserted when an update occurs.b. Increase the k-safe value will help?
Negatively. You can my blog post on how using /*+direct*/ affects performance. However, you're skipping WOS and going directly to ROS, which causes higher IO at run time. You should try to optimize your projections for an UPDATE, or run the DBD in query specific mode with your UPDATE statement.c. Also, we are using /*DIRECT*/for all DML statements. How it is impacting performance
Checkout vertica.tips for more Vertica resources.