Page 1 of 1

How is table data stored on disk?

Posted: Wed Mar 07, 2012 2:02 pm
by Jbaskin
Hello, Does anyone know how Vertica stores the data in tables on disk? The directories in the file system are just a bunch of confusing numbers. Thanks.

Re: How is table data stored on disk?

Posted: Thu Mar 22, 2012 1:30 pm
by JimKnicely
Hi jsbaskin,

Projection data in Verica is stored in ROS (Read Optimized Storage) containers on disk. The root location on disk where your data is located can be found by querying the DISK_STORAGE system table:

Code: Select all

dbadmin=> select storage_path from v_monitor.disk_storage where storage_usage = 'DATA';
                  storage_path
-------------------------------------------------
 /data1/data/intersect/v_intersect_node0001_data
 /data2/data/intersect/v_intersect_node0001_data
(2 rows)
In the storage path directory on disk you'll see a bunch of sub-directories having numbered names ranging from 001 up to 999. Each directory is a ROS container. In each of those directories are additional sub-directories named by using a unique identifier called the projection storage uid. In these sub-directories is where you'll find your data files.

For example, I have a table named holiday_dim. Say I want to find out where on disk its data files are located.

First I would need to find out which projections have been created for the table. To do that I would query the PROJECTION_STORAGE system table:

Code: Select all

dbadmin=> select projection_name, row_count, ros_count from v_monitor.projection_storage where anchor_table_name = 'holiday_dim';
   projection_name    | row_count | ros_count
----------------------+-----------+-----------
 holiday_dim_node0001 |        60 |         1
 holiday_dim_node0002 |        60 |         1
 holiday_dim_node0003 |        60 |         1
(3 rows)
Now that I know the projection name (holiday_dim_node0001) on node 1, I can find out the ROS container's unique id by querying the STORAGE_CONTAINERS system table:

Code: Select all

dbadmin=> select storage_type, storage_oid from v_monitor.storage_containers where projection_name = 'holiday_dim_node0001';
 storage_type |    storage_oid
--------------+-------------------
 ROS          | 45035996274585219
(1 row)
The storage_oid is the unique numeric ID assigned by the Vertica catalog that identifies the storage location. It's basically the directory name on disk.

Since I know that my data is stored in the location /data1/data/intersect/v_intersect_node0001_data on disk, I just need to find the ROS container directory under there... For that I can use the Linux find command:

Code: Select all

bash-3.2$ pwd
/data1/data/intersect/v_intersect_node0001_data
bash-3.2$ find ./ -name 45035996274585219
./219/45035996274585219
bash-3.2$ cd 219/45035996274585219
bash-3.2$ ls -lrt
total 8
-rw------- 1 dbadmin dbadmin  48 Mar 21 22:31 45035996274585219_0.pidx
-rw------- 1 dbadmin dbadmin 134 Mar 21 22:31 45035996274585219_0.fdb
There are always two files located in the directory for a projection. One file has a .fdb extension and the other has a .pidx extension. The .fdb file is the data itself (encoded and compressed) while the .pidx file contains index/statistics info from the data file.

Notice that the last three digits of the projection directory represents the ROS container's directory name:
  • /data1/data/intersect/v_intersect_node0001_data/219/45035996274585219
Have fun!

Re: How is table data stored on disk?

Posted: Wed Apr 11, 2012 10:41 am
by man
Hi!
I have an another question...how works the projections data files creation? There is one data file only that stores data? Or is it possible to store data of projection in different files?
And...
I read something about automatic managemet of storage location...but obviusly if there is one file per projection's data it is always stored on one storage location...is it correct?

Tnx!

Re: How is table data stored on disk?

Posted: Fri Apr 13, 2012 1:07 pm
by JimKnicely
Hi man,

Great question. I believe that projection data is stored in one file per node unless the projection's anchor table was created using the PARTITION clause... in that case, there will be one file for each partition. When you create a table with the PARTITION clause then all projections based on that table are also partitioned.

To demonstrate, let's create a partitioned table:

Code: Select all

dbadmin=> create table test1 (col1 int, col2 varchar2(100), primary key (col1)) partition by col1;
CREATE TABLE
dbadmin=> insert into test1 values (1, 'Pittsburgh');
 OUTPUT
--------
      1
(1 row)

dbadmin=> insert into test1 values (2, 'Toronto');
 OUTPUT
--------
      1
(1 row)

dbadmin=> insert into test1 values (3, 'Dallas');
 OUTPUT
--------
      1
(1 row)
We can check the v_monitor.projection_storage system table to find the table's projections:

Code: Select all

dbadmin=> select projection_name, row_count, ros_count from v_monitor.projection_storage where anchor_table_name = 'test1';
 projection_name | row_count | ros_count
-----------------+-----------+-----------
 test1_node0001  |         3 |         3
 test1_node0002  |         3 |         3
 test1_node0003  |         3 |         3
(3 rows)
Then we can find the storage containers used by the "test1_node0001" projection via the v_monitor.storage_containers system table:

Code: Select all

dbadmin=> select storage_type, storage_oid from v_monitor.storage_containers where projection_name = 'test1_node0001';
 storage_type |    storage_oid
--------------+-------------------
 ROS          | 45035996276572379
 ROS          | 45035996276572389
 ROS          | 45035996276572399
(3 rows)
We see that there are three ROS containers for the single projection, one ROS for each partition.

If we go out the OS we'll also see that there are three separate .fdb data files, one for each ROS and therefore one for each partition:

Code: Select all

verticatst01:/usr/local/data/intersect/v_intersect_node0001_data $ ls -lrt ./379/45035996276572379
total 8
-rw------- 1 dbadmin dbadmin 48 Apr 13 07:17 45035996276572379_0.pidx
-rw------- 1 dbadmin dbadmin  3 Apr 13 07:17 45035996276572379_0.fdb
verticatst01:/usr/local/data/intersect/v_intersect_node0001_data $ ls -lrt ./389/45035996276572389
total 8
-rw------- 1 dbadmin dbadmin 48 Apr 13 07:17 45035996276572389_0.pidx
-rw------- 1 dbadmin dbadmin  3 Apr 13 07:17 45035996276572389_0.fdb
verticatst01:/usr/local/data/intersect/v_intersect_node0001_data $ ls -lrt ./399/45035996276572399
total 8
-rw------- 1 dbadmin dbadmin 48 Apr 13 07:17 45035996276572399_0.pidx
-rw------- 1 dbadmin dbadmin  3 Apr 13 07:17 45035996276572399_0.fdb
I hope this helps!

Re: How is table data stored on disk?

Posted: Tue Apr 23, 2013 8:04 am
by vicky
When query is running, What happens if node fails? I mean regarding performance

Re: How is table data stored on disk?

Posted: Wed Dec 16, 2015 8:04 pm
by becky
Hi guys,

I upgraded to 7.2. I not

I thought that the only files asscoiated with data are the .fb and .pidx files. But I now see .gt files:

Code: Select all

[dbadmin@BK 277]$ ls -lrt
total 4
-rw------- 1 dbadmin verticadba 438 Dec 16 13:50 0267ea83948962386adc79978bf46adc00a0000000024745_0.gt
Anyone know what these .gt files are?

Re: How is table data stored on disk?

Posted: Wed Dec 16, 2015 8:35 pm
by NorbertKrupa
Coming in 7.3 there will be integration with Grand Theft Auto.