Row Counts for Tables Discrepancies

Moderator: NorbertKrupa

clv100
Beginner
Beginner
Posts: 27
Joined: Tue Oct 16, 2012 6:04 pm

Row Counts for Tables Discrepancies

Post by clv100 » Thu Sep 26, 2013 4:34 pm

Our daily processing involves updating/inserting from a stage table to a fact table. Each fact table load is handled independently and an update is done if the key values match between the stage and fact OR an insert is done if the keys do not exist. We take a count of the table before the update/inserts and audit between the stage table and the fact table after each load.
Last week we had a "data integrity violation" on a delete during the update/insert for a table. Our load stopped and we re-ran the load for the table. We knew our counts would be off. But over the next few days we started to get bad counts on that table every day. I checked and found that the data spanning the previous week and current week had dupes. I isolated the duplicate keys for each week. When I did a select for the number of rows having duplicate keys in the previous week I got 22 rows (on 11 duplicate keys) - however, when I went to do the delete (without a commit at the time) - it would have deleted 54 rows!!!

First, what causes the "data integrity issue" on the delete?
And,second, why would I get different counts on a select vs a delete??

Any help would be appreciated!! Thanks,
CVezza

scutter
Master
Master
Posts: 302
Joined: Tue Aug 07, 2012 2:15 am

Re: Row Counts for Tables Discrepancies

Post by scutter » Thu Sep 26, 2013 7:12 pm

Hi CVezza,

Data integrity errors like this should go through a Vertica Support ticket. There are multiple possible root causes.

Check to make sure that all of your projections are returning the same result for the SELECT - reference the projections by name in the query to check that.

--Sharon
Sharon Cutter
Vertica Consultant, Zazz Technologies LLC

id10t
GURU
GURU
Posts: 732
Joined: Mon Apr 16, 2012 2:44 pm

Re: Row Counts for Tables Discrepancies

Post by id10t » Thu Sep 26, 2013 7:15 pm

Hi!

Constraints violation analysis based on PK/FK and unique field, while DELETE deletes data based on filtering.
Without table definition (especially definitions of PK/FK/UNIQUE ) and delete query it's very hard to say what going on in your case.

Based on info that you provided, what I was doing?
1. Create a clone of tables
2. Run DELETE
After it I was checking (when I see/know a deleted data): Why it deleted? Why condition returns TRUE ? Why there are difference in rows between analyze_constraints and DELETE? You must to check and validate(better to prove it mathematically) your BOOLEAN CONDITION to be sure in process.

clv100
Beginner
Beginner
Posts: 27
Joined: Tue Oct 16, 2012 6:04 pm

Re: Row Counts for Tables Discrepancies

Post by clv100 » Thu Sep 26, 2013 7:58 pm

I have opened the case with Vertica but as of yet, they have not responded to my questions specifically about the data integrity issue - I understand that an update is really a delete/insert and somehow during this "data integrity violation" we ended up with duplicates in our table - since I am definitely a Vertica newbie, I am totally confused about the difference between my "select" and my "delete" row counts when I am using the exact same criteria for both SQL statements. I am inclined to believe that when it is giving me a delete count it is including those records that have been MARKED as deletes (after an update) - on top of the "real" rows in the fact table.

User avatar
JimKnicely
Site Admin
Site Admin
Posts: 1825
Joined: Sat Jan 21, 2012 4:58 am
Contact:

Re: Row Counts for Tables Discrepancies

Post by JimKnicely » Thu Sep 26, 2013 9:27 pm

This is interesting. #clv100, can you post the select and delete statements so that we can take a gander at 'em?
Jim Knicely

Image

Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.

clv100
Beginner
Beginner
Posts: 27
Joined: Tue Oct 16, 2012 6:04 pm

Re: Row Counts for Tables Discrepancies

Post by clv100 » Thu Oct 03, 2013 6:59 pm

This problem has continued over the past two weeks. It is only happening on this ONE table. I have been in contact with Vertica but they have not given me a reason WHY this is happening. My solution to the first instance was to back up all the duplicate rows that were in the table,delete them from the table and then re-load the last 'U' record for each week,store,item key - I don't know if this was a result of the data integrity violation?? Our SQL process for all our tables is based on an UPDATE to the column from the staging table where the staging keys match the table keys -if they do- the column is updated (a 'U' is set for the upd_flag- this updates all the rows for that column where the keys match) OR if the keys do not match - the entire row for each column is inserted (upd_flag 'I') into the table - this has been working for ALL the tables until 3 weeks ago when we got the data integrity violation on this ONE table. I initially know there is a problem when my audit control tells me that my staging counts for updates and inserts do not match my table counts for the updates and inserts - in other words I am updating MORE rows on the table than I have in the staging table. Sure enough when I check, I find I have a scenario such as this:

week_gen_key store_gen_key item_gen_key upd_flag comp_flag maint_date
52013033 500920 9000767453 I 99 2013-09-16
52013033 500920 9000767453 U 99 2013-09-17
52013033 500920 9000767453 U 99 2013-09-19
52013033 500920 9000767453 U 99 2013-09-22
52013033 500920 9000767453 U 99 2013-09-22

Now in this case I should ONLY HAVE ONE ROW in my fact table!!! The 'I' record should have been updated to a 'U' on 9-17-2013 but as you can see from this list - I have multiple rows for the same keys!!! My solution was to unload the duplicate rows - delete them for the table and then re-load the last 'U' record for that set of keys. When I check - there are NO duplicates in my table after this process. This has been happening now arbitrarily for the past 3 weeks - I get bad counts and I check an find I have duplicates BUT there has never been another "data integrity violation" - until this morning. Nothing in my table would update this morning and there were at least a 1/2 dozen column data integrity violations during the update and nothing was committed although I did have INSERTS.

vsql:/TRDW/PROD/scripts/nonsales_scripts/load/load_f_inventory.sql:74: ERROR 3030: Delete: could not find a data row to delete (data integrity violation?)
13-10-03 04:24:07[process_nonsales_files]\t--- Error loading table f_inventory.

This is the table definition:
CREATE TABLE non_sales.f_inventory(
week_gen_key INTEGER NOT NULL,
item_gen_key INTEGER NOT NULL,
store_gen_key INTEGER NOT NULL,
back_order_u INTEGER,
drp_ship_susp_u INTEGER,
in_transit_d DECIMAL(14, 4),
in_transit_u INTEGER,
perp_on_hand_d DECIMAL(14, 4),
perp_on_hand_u INTEGER,
safe_act_stk_d DECIMAL(14, 4),
safe_act_stk_u INTEGER,
shipment_d DECIMAL(14, 4),
shipment_u INTEGER,
whse_shpmt_d DECIMAL(14, 4),
whse_shpmt_u INTEGER,
whse_susp_u INTEGER,
whse_trnsf_stat_u INTEGER,
whse_work_stat_u INTEGER,
return_vendor_u INTEGER,
return_vendor_d DECIMAL(14, 4),
store_distrib INTEGER,
safe_stk_u INTEGER,
avg_cost DECIMAL(14, 4),
drp_ship_susp_d DECIMAL(14, 4),
back_order_d DECIMAL(14, 4),
can_curr_rate DECIMAL(14, 4),
metric1 INTEGER,
metric2 DECIMAL(14,4),
metric3 INTEGER,
metric4 DECIMAL(14,4),
metric5 INTEGER,
metric6 DECIMAL(14,4),
comp_flag INTEGER,
upd_flag CHAR(1),
maint_date DATE,
CONSTRAINT fk_week_key FOREIGN KEY (week_gen_key) REFERENCES public.d_period(period_gen_key),
CONSTRAINT fk_store_key FOREIGN KEY (store_gen_key) REFERENCES public.d_location(loc_gen_key),
CONSTRAINT fk_item_key FOREIGN KEY (item_gen_key) REFERENCES public.d_product(prod_gen_key)
);

And this is a snippet from our SQL load script for this table for the update logic:

UPDATE non_sales.f_inventory
SET
back_order_u = non_sales.stage_f_inventory.back_order_u

FROM non_sales.stage_f_inventory
WHERE non_sales.f_inventory.week_gen_key = non_sales.stage_f_inventory.week_gen_key AND
non_sales.f_inventory.item_gen_key = non_sales.stage_f_inventory.item_gen_key AND
non_sales.f_inventory.store_gen_key = non_sales.stage_f_inventory.store_gen_key AND
non_sales.stage_f_inventory.back_order_u IS NOT NULL ;

UPDATE non_sales.f_inventory
SET
drp_ship_susp_u = non_sales.stage_f_inventory.drp_ship_susp_u

FROM non_sales.stage_f_inventory
WHERE non_sales.f_inventory.week_gen_key = non_sales.stage_f_inventory.week_gen_key AND
non_sales.f_inventory.item_gen_key = non_sales.stage_f_inventory.item_gen_key AND
non_sales.f_inventory.store_gen_key = non_sales.stage_f_inventory.store_gen_key AND
non_sales.stage_f_inventory.drp_ship_susp_u IS NOT NULL ;

UPDATE non_sales.f_inventory
SET
in_transit_d = non_sales.stage_f_inventory.in_transit_d

FROM non_sales.stage_f_inventory
WHERE non_sales.f_inventory.week_gen_key = non_sales.stage_f_inventory.week_gen_key AND
non_sales.f_inventory.item_gen_key = non_sales.stage_f_inventory.item_gen_key AND
non_sales.f_inventory.store_gen_key = non_sales.stage_f_inventory.store_gen_key AND
non_sales.stage_f_inventory.in_transit_d IS NOT NULL ;

The same UPDATE approach is being used in all our other tables and we have not had any issues - We are NOT receiving duplicate records in the INPUT file to our process for this table - Our approach today is to delete ALL the duplicate keys for this week period - unload the f_inventory table to a flat file - drop cascade/create the table and then COPY the flat file back to the f_inventory table and THEN re-run our Update/Insert process - those duplicate key records SHOULD be re-inserted with an upd_flag of 'I' and the INSERTS were committed to the f_inventory table this morning - should be UPDATES when the load process is re-run. We are at a loss as to why this is happening and why only on this table. And as I said before, we are very very NEW to Vertica.

clv100
Beginner
Beginner
Posts: 27
Joined: Tue Oct 16, 2012 6:04 pm

Re: Row Counts for Tables Discrepancies

Post by clv100 » Thu Oct 03, 2013 7:23 pm

P.S. Duplicates from this morning:

store_gen_key week_gen_key item_gen_key upd_flag maint_date
500920 52013035 9000767453 U 2013-10-01
500920 52013035 9000767453 U 2013-10-01
500920 52013035 9000767453 U 2013-10-02
500920 52013035 9000767466 U 2013-10-01
500920 52013035 9000767466 U 2013-10-01
500920 52013035 9000767466 U 2013-10-02
500920 52013035 9000767479 U 2013-10-01
500920 52013035 9000767479 U 2013-10-01
500920 52013035 9000767479 U 2013-10-02
500920 52013035 9000767563 U 2013-10-01
500920 52013035 9000767563 U 2013-10-01
500920 52013035 9000767563 U 2013-10-02
500920 52013035 9000767602 U 2013-10-01
500920 52013035 9000767602 U 2013-10-01
500920 52013035 9000767602 U 2013-10-02
500920 52013035 9000767628 U 2013-10-01
500920 52013035 9000767628 U 2013-10-01
500920 52013035 9000767628 U 2013-10-02
500920 52013035 9000767644 U 2013-10-01
500920 52013035 9000767644 U 2013-10-01
500920 52013035 9000767644 U 2013-10-02
500920 52013035 9000767738 U 2013-10-01
500920 52013035 9000767738 U 2013-10-01
500920 52013035 9000767738 U 2013-10-02
500920 52013035 9000767929 U 2013-10-01
500920 52013035 9000767929 U 2013-10-01
500920 52013035 9000767929 U 2013-10-02
500920 52013035 9000773447 U 2013-10-01
500920 52013035 9000773447 U 2013-10-01
500920 52013035 9000773447 U 2013-10-02
500920 52013035 9000776101 U 2013-10-01
500920 52013035 9000776101 U 2013-10-01
500920 52013035 9000776101 U 2013-10-02
500920 52013035 9000776169 U 2013-10-01
500920 52013035 9000776169 U 2013-10-01
500920 52013035 9000776169 U 2013-10-02

I did an ANALYZE_CONSTRAINTS on the table non_sales.f_inventory and nothing was returned. This morning when I did a :

select store_gen_key,week_gen_key,item_gen_key,count(item_gen_key) from non_sales.f_inventory where week_gen_key = 52013035 group by store_gen_key,week_gen_key,item_gen_key having count(item_gen_key) > 1;

I got 12 duplicate keys with 3 rows each or 36 rows (see above)
I just ran the delete:
delete from non_sales.f_inventory where store_gen_key = 500920 and week_gen_key = 52013035
and item_gen_key in (9000767644,9000767738,9000767929,9000767453,9000767466,9000767479,9000767563,9000773447,9000776101,9000767602,9000767628,9000776169);

commit;

and only got a count of 12!!!
Yet, when I checked my row count on the f_inventory table it was 36 rows less than the count this morning!!!

Post Reply

Return to “New to Vertica Database Administration”