Measuring and Setting your Disk Performance

Moderator: NorbertKrupa

Post Reply
User avatar
JimKnicely
Site Admin
Site Admin
Posts: 1825
Joined: Sat Jan 21, 2012 4:58 am
Contact:

Measuring and Setting your Disk Performance

Post by JimKnicely » Tue Mar 13, 2012 9:11 pm

There are two storage management functions in Vertica that allow us to measure and record the performance characteristics (Throughput and Latency) of our individual storage locations. It's especially important to gather this information if you have multiple storage locations on the same node. Vertica will use this knowledge later when determining where (preferably on the fastest location) to store the data that is accessed the most.

The function that measures the storage location performance is named MEASURE_LOCATION_PERFORMANCE while the function that is used to record the performance information is named SET_LOCATION_PERFORMANCE.

Basically you have to perform a two step process. First measure a location performance and second record the values returned from step one.

You can query the DISK_STORAGE table to get a list of storage locations. You only need to measure and set the performance ratings for data locations.

Code: Select all

dbadmin=> SELECT node_name, storage_path FROM disk_storage WHERE node_name = 'v_intersect_node0001' AND storage_usage = 'DATA';
      node_name       |                  storage_path
----------------------+-------------------------------------------------
 v_intersect_node0001 | /data1/data/intersect/v_intersect_node0001_data
 v_intersect_node0001 | /data2/data/intersect/v_intersect_node0001_data
(2 rows)
We have two data storage locations on node 1 so I'll need to measure the performance of each and then inform Vertica of the measured performance of each.

Code: Select all

dbadmin=> SELECT MEASURE_LOCATION_PERFORMANCE('/data1/data/intersect/v_intersect_node0001_data', 'v_intersect_node0001');
WARNING:  measure_location_performance can take a long time. Please check logs for progress
           MEASURE_LOCATION_PERFORMANCE
---------------------------------------------------
 Throughput : 634 MB/sec. Latency : 1200 seeks/sec
(1 row)

dbadmin=> SELECT SET_LOCATION_PERFORMANCE('/data1/data/intersect/v_intersect_node0001_data', 'v_intersect_node0001', 634, 1200);
                            SET_LOCATION_PERFORMANCE
--------------------------------------------------------------------------------
 /data1/data/intersect/v_intersect_node0001_data performance statistics stored.
(1 row)

[code]dbadmin=> SELECT MEASURE_LOCATION_PERFORMANCE('/data2/data/intersect/v_intersect_node0001_data', 'v_intersect_node0001');
WARNING:  measure_location_performance can take a long time. Please check logs for progress
           MEASURE_LOCATION_PERFORMANCE
---------------------------------------------------
 Throughput : 218 MB/sec. Latency : 240 seeks/sec
(1 row)

dbadmin=> SELECT SET_LOCATION_PERFORMANCE('/data2/data/intersect/v_intersect_node0001_data', 'v_intersect_node0001', 218, 240);
                            SET_LOCATION_PERFORMANCE
--------------------------------------------------------------------------------
 /data2/data/intersect/v_intersect_node0001_data performance statistics stored.
(1 row)
You should execute both functions on all data storage locations on each of the nodes in your cluster. Too bad there isn't an option to combine the two function calls into a single SELECT statement. Oddly, they each have to be part of a top level SELECT.

Also, Vertica isn't kidding with the warning message stating that the measurement process can take a long time, especially with larger and slower disks!Try to run it during non-peek hours. When I ran the MEASURE_LOCATION_PERFORMANCE in the above examples, it took about 5 minutes for the first 500 GB location and 20 minutes for the second 1.5 TB location.
Jim Knicely

Image

Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.

Post Reply

Return to “Vertica Tips, Lessons and Examples”