LOAD_STREAMS table sorted = unsorted ?

Moderator: NorbertKrupa

Post Reply
User avatar
Julie
Master
Master
Posts: 221
Joined: Thu Apr 19, 2012 9:29 pm

LOAD_STREAMS table sorted = unsorted ?

Post by Julie » Mon Sep 22, 2014 6:58 pm

Hi all,

Does it seem to you that the output from the load streams table below is correct? Look at the last there columns. How can the unsorted row count equal the sorted row count?

Code: Select all

dbadmin=> \x
Expanded display is on.

dbadmin=> select * from load_streams;
-[ RECORD 1 ]----------+------------------------------
session_id             | vertica03-17212:0x94f31
transaction_id         | 45035994273826186
statement_id           | 2
stream_name            |
schema_name            | METRICS_DIV
table_id               | 45035396274828884
table_name             | DATA1COMPLETE
load_start             | 2014-09-22 10:43:04.995783-04
load_duration_ms       | 645345
is_executing           | f
accepted_row_count     | 0
rejected_row_count     | 0
read_bytes             | 0
input_file_size_bytes  | 0
parse_complete_percent |
unsorted_row_count     | 16525529
sorted_row_count       | 16525529
sort_complete_percent  | 100
Am I missing something here?
Thanks,
Juliette

NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Re: LOAD_STREAMS table sorted = unsorted ?

Post by NorbertKrupa » Mon Sep 22, 2014 7:28 pm

If you take a look at the documentation:
UNSORTED_ROW_COUNT could be greater than ACCEPTED_ROW_COUNT because data is copied and sorted for every projection in the target table.
Checkout vertica.tips for more Vertica resources.

User avatar
Julie
Master
Master
Posts: 221
Joined: Thu Apr 19, 2012 9:29 pm

Re: LOAD_STREAMS table sorted = unsorted ?

Post by Julie » Mon Sep 22, 2014 7:43 pm

That doesn't explain why in my example, unsorted_row_count = sorted_row_count ...
Thanks,
Juliette

NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Re: LOAD_STREAMS table sorted = unsorted ?

Post by NorbertKrupa » Mon Sep 22, 2014 7:46 pm

Can you post the definition of the table (including projections)?
Checkout vertica.tips for more Vertica resources.

User avatar
Julie
Master
Master
Posts: 221
Joined: Thu Apr 19, 2012 9:29 pm

Re: LOAD_STREAMS table sorted = unsorted ?

Post by Julie » Tue Sep 23, 2014 1:49 pm

Here is a simple example...

Code: Select all

dbadmin=> CREATE TABLE test_dim
dbadmin-> (
dbadmin(>     test_ID int NOT NULL,
dbadmin(>     test_PROFILE_DIM_ID int NOT NULL,
dbadmin(>     test_KEY varchar(14) NOT NULL,
dbadmin(>     INC_NAME varchar(100) NOT NULL,
dbadmin(>     AGE_NAME varchar(100) NOT NULL,
dbadmin(>     MAR_NAME varchar(100) NOT NULL,
dbadmin(>     EDU_NAME varchar(100) NOT NULL,
dbadmin(>     CHLD_NAME varchar(100) NOT NULL,
dbadmin(>     GEND_NAME varchar(1) NOT NULL)
dbadmin->  ORDER BY EDU_NAME,
dbadmin->           MAR_NAME,
dbadmin->           AGE_NAME,
dbadmin->           INC_NAME,
dbadmin->           GEND_NAME,
dbadmin->           CHLD_NAME,
dbadmin->           test_ID
dbadmin-> SEGMENTED BY hash(test_ID) ALL NODES KSAFE 1;
CREATE TABLE
dbadmin=> ALTER TABLE test_DIM ADD CONSTRAINT C_PRIMARY PRIMARY KEY (test_ID);
ALTER TABLE

Code: Select all

dbadmin=> copy test_dim from '/data/test_dim.txt' direct;
 Rows Loaded
-------------
    17541682
(1 row)

dbadmin=> \x
Expanded display is on.

dbadmin=> select * from load_streams where table_name = 'test_dim';
-[ RECORD 1 ]----------+------------------------------
session_id             | vertica03-17212:0x94f31
transaction_id         | 45035996273867997
statement_id           | 1
stream_name            |
schema_name            | public
table_id               | 45035996275699830
table_name             | test_dim
load_start             | 2014-09-23 08:45:11.153798-04
load_duration_ms       | 23886
is_executing           | f
accepted_row_count     | 17541682
rejected_row_count     | 0
read_bytes             | 1217838681
input_file_size_bytes  | 1217838681
parse_complete_percent | 100
unsorted_row_count     | 35083364
sorted_row_count       | 35083364
sort_complete_percent  | 100
See how the unsorted_row_count value equals the sorted_row_count value?
Thanks,
Juliette

NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Re: LOAD_STREAMS table sorted = unsorted ?

Post by NorbertKrupa » Tue Sep 23, 2014 3:40 pm

You only have a single, default super-projection for this table?
Checkout vertica.tips for more Vertica resources.

scutter
Master
Master
Posts: 302
Joined: Tue Aug 07, 2012 2:15 am

Re: LOAD_STREAMS table sorted = unsorted ?

Post by scutter » Wed Sep 24, 2014 2:53 pm

Hi Julie,

I think this is just a poorly named or mis-documented column. I don’t think that the unsorted_row_count is intended to show progress as the current number of unsorted rows. It’s showing you the total number of rows going into the sort phase - the number of rows that have reached the DataTarget operator which sorts data and writes to projections - and the sorted_row_count is showing you how many rows have been sorted. So if 1000 rows are loaded, into two projections, then the unsorted_row_count would be 2000. If you had projection bloat and six projections, you’d see 6000.

The load_streams table is just a view on top of data collector tables. See vs_system_views. It’s using the ‘input rows’ counter for DataTarget to get the number of rows going into the DataTarget operator. The ‘input rows’ counter would never change after all rows get to DataTarget. So until data gets to DataTarget, the value would be 0, then it would rise as data reaches the DataTarget operator, until it reaches its max number of rows. So for a long-running load or long-running INSERT..SELECT, this value might appear to be more meaningful - you’d see it increasing. But when a load completes, this column won’t ever show zero since it’s not the remaining number of rows to sort but rather the total number of rows that have reached DataTarget.

The load_streams table was originally not a view on top of dc tables. It might be that in those older versions that unsorted_row_count was actually something that changed over the course of the load and returned to zero at the end - I don’t recall.

—Sharon
Sharon Cutter
Vertica Consultant, Zazz Technologies LLC

Post Reply

Return to “Vertica Data Load”