Accurate Projection Row Count

Moderator: NorbertKrupa

Post Reply
User avatar
JimKnicely
Site Admin
Site Admin
Posts: 1825
Joined: Sat Jan 21, 2012 4:58 am
Contact:

Accurate Projection Row Count

Post by JimKnicely » Tue Sep 17, 2013 5:49 pm

FYI...

Here is a query that should give you accurate projection row counts:

Code: Select all

SELECT ps.projection_schema schema_name,
       ps.anchor_table_name table_name,
       CASE
         WHEN p.node_name IS NOT NULL THEN ps.row_count
         ELSE (SELECT SUM(ps2.row_count)
                 FROM projection_storage ps2
                WHERE ps2.projection_id = ps.projection_id)
         END row_count
  FROM projection_storage ps
  JOIN projections p
    ON ps.projection_id = p.projection_id
 GROUP BY 1, 2, 3
 ORDER BY 1, 2;
Jim Knicely

Image

Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.

scutter
Master
Master
Posts: 302
Joined: Tue Aug 07, 2012 2:15 am

Re: Accurate Projection Row Count

Post by scutter » Tue Sep 17, 2013 6:38 pm

Hi Jim,

The results of this can actually be misleading. The projection_storage row count is all rows regardless of whether or not they've been deleted. Whether or not a deleted row is still present in a ROS container will depend on the AHM and whether the table has been purged or a ROS container merged by the tuple mover. So the projection_storage row counts compared between two different clusters can vary even if the count(*) output is the same.

The same is true when comparing row counts for two buddy segments between nodes - it's possible that one node's ROS containers have been merged/purged when the other node with the same segment haven't.

--Sharon
Sharon Cutter
Vertica Consultant, Zazz Technologies LLC

id10t
GURU
GURU
Posts: 732
Joined: Mon Apr 16, 2012 2:44 pm

Re: Accurate Projection Row Count

Post by id10t » Tue Sep 17, 2013 7:28 pm

Hi Jim!

I'm agree with Sharon: my additional argument is that row_count in PROJECTION_STORAGE depends on statistics and AnalyzeRowCountInterval parameter.
Vertica automatically runs every 60 (default value of AnalyzeRowCountInterval parameter) seconds to collect the number of rows in the projection and aggregates row counts calculated during loads. So it possible that value of row_count will not be "accurate".

PS
Of cause Sharon's arguments much more stronger than my.

scutter
Master
Master
Posts: 302
Joined: Tue Aug 07, 2012 2:15 am

Re: Accurate Projection Row Count

Post by scutter » Wed Sep 18, 2013 1:39 pm

Fyi - I believe that statistics (and AnalyzeRowCount) are independent of row_count in PROJECTION_STORAGE. That row_count comes straight from the ROS containers.
Sharon Cutter
Vertica Consultant, Zazz Technologies LLC

id10t
GURU
GURU
Posts: 732
Joined: Mon Apr 16, 2012 2:44 pm

Re: Accurate Projection Row Count

Post by id10t » Wed Sep 18, 2013 1:50 pm

Hi!

@sharon
You are right about AnalyzeRowCount and ROW_COUNT. My fail :)

One of Shoaron's arguments:

Code: Select all

daniel=> select count(*) from test;
 count 
-------
    14
(1 row)

daniel=> select schema_name,projection_name,deleted_row_count from delete_vectors;
 schema_name | projection_name | deleted_row_count 
-------------+-----------------+-------------------
 public      | test_super      |                 2
(1 row)

daniel=> select row_count from projection_storage where anchor_table_name = 'test';
 row_count 
-----------
        16
(1 row)

@jim
Don't read my previous post :)))

User avatar
JimKnicely
Site Admin
Site Admin
Posts: 1825
Joined: Sat Jan 21, 2012 4:58 am
Contact:

Re: Accurate Projection Row Count

Post by JimKnicely » Wed Sep 18, 2013 7:12 pm

Hi guys!

Thank you for the great feed back!

Do you think this is a more viable alternative query to get an accurate projection row count?

Code: Select all

SELECT p.projection_schema,
       p.projection_name,
       p.anchor_table_name,
       NVL(CASE
             WHEN p.node_name IS NOT NULL THEN (sc.total_row_count - sc.deleted_row_count)
               ELSE (SELECT SUM(sc2.total_row_count) - SUM(sc2.deleted_row_count)
                       FROM storage_containers sc2
                      WHERE sc2.projection_id = p.projection_id)  
       END, 0) row_count          
  FROM storage_containers sc
  JOIN projections p
    ON sc.projection_id = p.projection_id
 GROUP BY 1, 2, 3, 4
 ORDER BY 1, 2, 3;
Jim Knicely

Image

Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.

Post Reply

Return to “Vertica Tips, Lessons and Examples”