I think this is just a poorly named or mis-documented column. I don’t think that the unsorted_row_count is intended to show progress as the current number of unsorted rows. It’s showing you the total number of rows going into the sort phase - the number of rows that have reached the DataTarget operator which sorts data and writes to projections - and the sorted_row_count is showing you how many rows have been sorted. So if 1000 rows are loaded, into two projections, then the unsorted_row_count would be 2000. If you had projection bloat and six projections, you’d see 6000.
The load_streams table is just a view on top of data collector tables. See vs_system_views. It’s using the ‘input rows’ counter for DataTarget to get the number of rows going into the DataTarget operator. The ‘input rows’ counter would never change after all rows get to DataTarget. So until data gets to DataTarget, the value would be 0, then it would rise as data reaches the DataTarget operator, until it reaches its max number of rows. So for a long-running load or long-running INSERT..SELECT, this value might appear to be more meaningful - you’d see it increasing. But when a load completes, this column won’t ever show zero since it’s not the remaining number of rows to sort but rather the total number of rows that have reached DataTarget.
The load_streams table was originally not a view on top of dc tables. It might be that in those older versions that unsorted_row_count was actually something that changed over the course of the load and returned to zero at the end - I don’t recall.
Vertica Consultant, Zazz Technologies LLC