Data vs compliance capacity

Moderator: NorbertKrupa

Post Reply
wsanders
Newbie
Newbie
Posts: 2
Joined: Tue Mar 03, 2015 6:42 pm

Data vs compliance capacity

Post by wsanders » Mon Mar 09, 2015 8:10 pm

We have a three node cluster, get_compliance_status() says it's 38% full. It looks like the 1TB community license limit is for total replicated space, but the real amount of data that can be stored is somewhat less then 1/4 of that. It's counting the data twice for each node:

At 38% of 1TB capacity:

dbadmin=> select storage_path, storage_usage, disk_space_used_mb, disk_space_used_mb / 1024 "disk_space_used_mb (GB)" from disk_storage;
storage_path | storage_usage | disk_space_used_mb | disk_space_used_mb (GB)
------------------------------------------------+---------------+--------------------+-------------------------
/home/dbadmin/dw/v_dw_node0001_catalog/Catalog | CATALOG | 95450 | 93.212890625000000000
/home/dbadmin/dw/v_dw_node0001_data | DATA,TEMP | 95450 | 93.212890625000000000
/home/dbadmin/dw/v_dw_node0002_catalog/Catalog | CATALOG | 95387 | 93.151367187500000000
/home/dbadmin/dw/v_dw_node0002_data | DATA,TEMP | 95387 | 93.151367187500000000
/home/dbadmin/dw/v_dw_node0003_catalog/Catalog | CATALOG | 95380 | 93.144531250000000000
/home/dbadmin/dw/v_dw_node0003_data | DATA,TEMP | 95380 | 93.144531250000000000

The Catalog directory is just a few MB. The actual amount of disk space used on each node is only 46GB, I am assuming it's compressed:

# du -sh /home/dbadmin/dw/v_dw_node0001_data
46G /home/dbadmin/dw/v_dw_node0001_data
# du -sh /home/dbadmin/dw/v_dw_node0001_catalog/Catalog
48M /home/dbadmin/dw/v_dw_node0001_catalog/Catalog

Is this the expected behavior? I expected to get 300GB data on each node before the 1TB community compliance limit was reached.

User avatar
JimKnicely
Site Admin
Site Admin
Posts: 1825
Joined: Sat Jan 21, 2012 4:58 am
Contact:

Re: Data vs compliance capacity

Post by JimKnicely » Mon Mar 09, 2015 8:28 pm

Hi,

The license of 1 TB is for the raw (uncompressed) database size. The data on disk is very highly compressed. Vertica does not count data elements twice against the license, only once.

Check out Norbert Krupa's tip which includes a query you can run to get the estimated raw size and compressed size of your database:

http://vertica.tips/2014/01/24/license- ... -raw-size/
Jim Knicely

Image

Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.

wsanders
Newbie
Newbie
Posts: 2
Joined: Tue Mar 03, 2015 6:42 pm

Re: Data vs compliance capacity

Post by wsanders » Mon Mar 09, 2015 9:39 pm

Looks like the info in "disk_storage" is basically useless. A better query:

select node_name,sum(used_bytes) / 1024 / 1024 / 1024 as GB from v_monitor.storage_containers group by node_name;

returns the total size of compressed data on disk, which corresponds roughly to the raw_estimate_gb returned by the query in the tip.

The compression ratio for our data appears to be about 2.9 to 1.

And replication also seems to count against the license limit, so we should hit the 1TB limit while using only about 120GB disk space on each node.
Last edited by wsanders on Mon Mar 09, 2015 10:30 pm, edited 1 time in total.

User avatar
JimKnicely
Site Admin
Site Admin
Posts: 1825
Joined: Sat Jan 21, 2012 4:58 am
Contact:

Re: Data vs compliance capacity

Post by JimKnicely » Mon Mar 09, 2015 9:57 pm

You may want to monitor DISK_STORAGE so that you don't run out of physical disk space :)

A particular table can take up a lot more disk space than what is calculated as the tables "raw" size depending on the number of projection it has.

I don't think that replicated data is counted more than once... You can use the AUDIT() function to estimate the raw data size of a database, a schema, a projection, or a table as it is counted in an audit of the database size.

Replicated Table:

Code: Select all

dbadmin=> create table test as select * from columns unsegmented all nodes;
CREATE TABLE

dbadmin=> select count(*) from projections where anchor_table_name = 'test';
 count
-------
     3
(1 row)

dbadmin=> select sum(used_bytes) from projection_storage where anchor_table_name = 'test';
  sum
--------
 147456
(1 row)

dbadmin=> select audit('test');
 audit
-------
 24149
(1 row)
Segmented Table:

Code: Select all

dbadmin=> drop table test;
DROP TABLE

dbadmin=> select count(*) from projections where anchor_table_name = 'test';
 count
-------
     2
(1 row)

dbadmin=> create table test as select * from columns segmented by hash(table_name) all nodes;
CREATE TABLE

dbadmin=> select sum(used_bytes) from projection_storage where anchor_table_name = 'test';
  sum
--------
 163840
(1 row)

dbadmin=> select audit('test');
 audit
-------
 24149
(1 row)
Jim Knicely

Image

Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.

Post Reply

Return to “New to Vertica Database Administration”