Page 1 of 1

LOAD_STREAMS table sorted = unsorted ?

Posted: Mon Sep 22, 2014 6:58 pm
by Julie
Hi all,

Does it seem to you that the output from the load streams table below is correct? Look at the last there columns. How can the unsorted row count equal the sorted row count?

Code: Select all

dbadmin=> \x
Expanded display is on.

dbadmin=> select * from load_streams;
-[ RECORD 1 ]----------+------------------------------
session_id             | vertica03-17212:0x94f31
transaction_id         | 45035994273826186
statement_id           | 2
stream_name            |
schema_name            | METRICS_DIV
table_id               | 45035396274828884
table_name             | DATA1COMPLETE
load_start             | 2014-09-22 10:43:04.995783-04
load_duration_ms       | 645345
is_executing           | f
accepted_row_count     | 0
rejected_row_count     | 0
read_bytes             | 0
input_file_size_bytes  | 0
parse_complete_percent |
unsorted_row_count     | 16525529
sorted_row_count       | 16525529
sort_complete_percent  | 100
Am I missing something here?

Re: LOAD_STREAMS table sorted = unsorted ?

Posted: Mon Sep 22, 2014 7:28 pm
by NorbertKrupa
If you take a look at the documentation:
UNSORTED_ROW_COUNT could be greater than ACCEPTED_ROW_COUNT because data is copied and sorted for every projection in the target table.

Re: LOAD_STREAMS table sorted = unsorted ?

Posted: Mon Sep 22, 2014 7:43 pm
by Julie
That doesn't explain why in my example, unsorted_row_count = sorted_row_count ...

Re: LOAD_STREAMS table sorted = unsorted ?

Posted: Mon Sep 22, 2014 7:46 pm
by NorbertKrupa
Can you post the definition of the table (including projections)?

Re: LOAD_STREAMS table sorted = unsorted ?

Posted: Tue Sep 23, 2014 1:49 pm
by Julie
Here is a simple example...

Code: Select all

dbadmin=> CREATE TABLE test_dim
dbadmin-> (
dbadmin(>     test_ID int NOT NULL,
dbadmin(>     test_PROFILE_DIM_ID int NOT NULL,
dbadmin(>     test_KEY varchar(14) NOT NULL,
dbadmin(>     INC_NAME varchar(100) NOT NULL,
dbadmin(>     AGE_NAME varchar(100) NOT NULL,
dbadmin(>     MAR_NAME varchar(100) NOT NULL,
dbadmin(>     EDU_NAME varchar(100) NOT NULL,
dbadmin(>     CHLD_NAME varchar(100) NOT NULL,
dbadmin(>     GEND_NAME varchar(1) NOT NULL)
dbadmin->  ORDER BY EDU_NAME,
dbadmin->           MAR_NAME,
dbadmin->           AGE_NAME,
dbadmin->           INC_NAME,
dbadmin->           GEND_NAME,
dbadmin->           CHLD_NAME,
dbadmin->           test_ID
dbadmin-> SEGMENTED BY hash(test_ID) ALL NODES KSAFE 1;
CREATE TABLE
dbadmin=> ALTER TABLE test_DIM ADD CONSTRAINT C_PRIMARY PRIMARY KEY (test_ID);
ALTER TABLE

Code: Select all

dbadmin=> copy test_dim from '/data/test_dim.txt' direct;
 Rows Loaded
-------------
    17541682
(1 row)

dbadmin=> \x
Expanded display is on.

dbadmin=> select * from load_streams where table_name = 'test_dim';
-[ RECORD 1 ]----------+------------------------------
session_id             | vertica03-17212:0x94f31
transaction_id         | 45035996273867997
statement_id           | 1
stream_name            |
schema_name            | public
table_id               | 45035996275699830
table_name             | test_dim
load_start             | 2014-09-23 08:45:11.153798-04
load_duration_ms       | 23886
is_executing           | f
accepted_row_count     | 17541682
rejected_row_count     | 0
read_bytes             | 1217838681
input_file_size_bytes  | 1217838681
parse_complete_percent | 100
unsorted_row_count     | 35083364
sorted_row_count       | 35083364
sort_complete_percent  | 100
See how the unsorted_row_count value equals the sorted_row_count value?

Re: LOAD_STREAMS table sorted = unsorted ?

Posted: Tue Sep 23, 2014 3:40 pm
by NorbertKrupa
You only have a single, default super-projection for this table?

Re: LOAD_STREAMS table sorted = unsorted ?

Posted: Wed Sep 24, 2014 2:53 pm
by scutter
Hi Julie,

I think this is just a poorly named or mis-documented column. I don’t think that the unsorted_row_count is intended to show progress as the current number of unsorted rows. It’s showing you the total number of rows going into the sort phase - the number of rows that have reached the DataTarget operator which sorts data and writes to projections - and the sorted_row_count is showing you how many rows have been sorted. So if 1000 rows are loaded, into two projections, then the unsorted_row_count would be 2000. If you had projection bloat and six projections, you’d see 6000.

The load_streams table is just a view on top of data collector tables. See vs_system_views. It’s using the ‘input rows’ counter for DataTarget to get the number of rows going into the DataTarget operator. The ‘input rows’ counter would never change after all rows get to DataTarget. So until data gets to DataTarget, the value would be 0, then it would rise as data reaches the DataTarget operator, until it reaches its max number of rows. So for a long-running load or long-running INSERT..SELECT, this value might appear to be more meaningful - you’d see it increasing. But when a load completes, this column won’t ever show zero since it’s not the remaining number of rows to sort but rather the total number of rows that have reached DataTarget.

The load_streams table was originally not a view on top of dc tables. It might be that in those older versions that unsorted_row_count was actually something that changed over the course of the load and returned to zero at the end - I don’t recall.

—Sharon