Talk to the people setting up your cluster. Vertica has a cookbook that describes how to configure a DL-380 box for a Vertica cluster.
http://www.vertica.com/resources-for-pl ... -hardware/
Share it with them and check after they're done to ensure that they followed the cookbook.
Next run some tests to measure network and disk performance. Ask your network engineer what else can be done to improve network speeds across the nodes in the cluster.
Run the validation scripts that come with Vertica to ensure performance is up to spec.
http://my.vertica.com/docs/7.1.x/HTML/i ... cripts.htm
After installing Vertica make sure that connection load balancing is set up.
http://my.vertica.com/docs/7.1.x/HTML/i ... ancing.htm
Validate connection load balancing by looking at the query_requests system table to ensure that user queries are spread evenly across all of the nodes in the cluster.
Code: Select all
SELECT user_name, node_name, count(*) AS NumRecs
FROM query_requests
GROUP BY 1,2
ORDER BY 1,2;
Set up resource pools.
http://www.vertica.com/2014/12/16/ad-ho ... he-rescue/
Once the environment is set up the next thing to focus on is ensuring that queries finish quickly. Here's why:
http://www.vertica.com/2014/07/17/workl ... -triangle/
I'm assuming your BI strategy is to have the BI tool send SQL queries to the database instead of caching the data on the BI Server. If possible, choose a BI tool that was designed to write optimized SQL for very large databases (VLDBs). For example, MicroStrategy has a VLDB connector for Vertica. Other BI tools I've looked at have Vertica connectors, but they don't write optimized SQL and require a BI and SQL Tuning expert to get them to perform well if they are deployed on top of a complex data model.
Take some time to read through the Administrators Guide, especially the section on projections, partitions, statistics, and analyzing workloads. Fact tables should be segmented, and you may need more than one projection per table.
http://my.vertica.com/docs/7.1.x/HTML/i ... sGuide.htm
Sections to pay attention to include
* Scalability & Concurrency.
http://my.vertica.com/docs/7.1.x/HTML/i ... Tuning.htm
* Sessions and the MaxClientSessions parameter.
http://my.vertica.com/docs/7.1.x/HTML/i ... ssions.htm
* How to Optimize Query Performance:
http://my.vertica.com/docs/7.1.x/HTML/i ... rmance.htm
Last of all, create and deploy a physical data model that is optimized for your workload. Hopefully your data model has only one table.
If that's not the case, it may take a couple of iterations to optimize the physical data model. The best data model for BI tool performance is one where all of the data has been denormalized into one table. If you need even better performance build live aggregate projections on top of that table. BI tools that can't write optimized SQL perform well against a one table data model.
If you have additional linux boxes you can add them to the cluster and then rebalance the cluster to see how performance scales with the number of nodes. Or take away a node and see what happens.
Talk to your Vertica Sales Rep. They can put you in touch with experts who can help with performance tuning.
Regards,
Doug