Behavior of ERROR 3149 for duplicate values

Moderator: NorbertKrupa

Post Reply
NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Behavior of ERROR 3149 for duplicate values

Post by NorbertKrupa » Wed Apr 23, 2014 10:21 pm

I'm trying to understand the behavior of Error 3149 for duplicate values on primary keys when performing joins. In the tests below, the inner table contains a duplicate value and is joined to another table with various column definitions. What I observed is that when the inner table is joined to the outer table on a primary key, no error is thrown. Is this expected behavior (with regards to errors being thrown)?

This was tested on v7.0.1-0.

Inner table

Code: Select all

CREATE TABLE public.table1 (
    id int PRIMARY KEY,
    val char(1)
);

INSERT INTO public.table1 VALUES (1, 'a');
INSERT INTO public.table1 VALUES (1, 'b');

SELECT * FROM public.table1;
 id | val
----+-----
  1 | a
  1 | b
(2 rows)

SELECT ANALYZE_CONSTRAINTS('public.table1');
 Schema Name | Table Name | Column Names | Constraint Name | Constraint Type | Column Values
-------------+------------+--------------+-----------------+-----------------+---------------
 public      | table1     | id           | C_PRIMARY       | PRIMARY         | ('1')
(1 row)
Test 1

When table1 is joined to a primary key column in table2, no error is thrown.

Code: Select all

CREATE TABLE public.table2 (
    id int PRIMARY KEY
);

INSERT INTO public.table2 VALUES (1);

SELECT a.val 
FROM   public.table1 a 
       JOIN public.table2 b 
         ON a.id = b.id;

 val
-----
 a
 b
(2 rows)
Test 2

When table1 is joined to a non-key column in table2, the error is thrown.

Code: Select all

DROP TABLE public.table2 CASCADE;

CREATE TABLE public.table2 (
    id int
);

INSERT INTO public.table2 VALUES (1);

SELECT a.val 
FROM   public.table1 a 
       JOIN public.table2 b 
         ON a.id = b.id;

ERROR 3149:  Duplicate primary/unique key detected in join [(public.table2 x public.table1) using table2_b0 and table1_b0 (PATH ID: 1)]; value [1]
Test 3

When table1 is joined to a primary key column containing the same duplicates in table2, the error is thrown.

Code: Select all

DROP TABLE public.table2 CASCADE;

CREATE TABLE public.table2 (
    id int PRIMARY KEY
);

INSERT INTO public.table2 VALUES (1);
INSERT INTO public.table2 VALUES (1);

SELECT a.val 
FROM   public.table1 a 
       JOIN public.table2 b 
         ON a.id = b.id;

ERROR 3149:  Duplicate primary/unique key detected in join [(public.table1 x public.table2) using table1_b0 and table2_b0 (PATH ID: 1)]; value [1]
Cross posted on Vertica Community.
Checkout vertica.tips for more Vertica resources.

User avatar
JimKnicely
Site Admin
Site Admin
Posts: 1825
Joined: Sat Jan 21, 2012 4:58 am
Contact:

Re: Behavior of ERROR 3149 for duplicate values

Post by JimKnicely » Thu Apr 24, 2014 3:09 am

Hi,

I wonder if this behavior is because you are probably getting a MERGE JOIN by your queries?

According the admin guide on page 426, a MERGE command will fail with a duplicate run time error if:
  • Duplicates in the source table. If HP Vertica finds more than one matching value in the source join column for a corresponding value in the target table, MERGE fails with a run-time error.
    Duplicates in the target table. If HP Vertica finds more than one matching value in target join column for a corresponding value in the source table, and the target join column has a unique or primary key constraint, MERGE fails with a run-time error. If the target join column has no such constraint, the statement runs without error and without optimization.
Although you aren't explicitly running a MERGE statement, the MERGE JOIN probably has the same logic.

What do you think?
Jim Knicely

Image

Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.

NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Re: Behavior of ERROR 3149 for duplicate values

Post by NorbertKrupa » Thu Apr 24, 2014 2:09 pm

Thanks, Jim! Looking at the explain plans for each test, all use a MERGEJOIN. If I'm reading the criteria correctly, test 2 & 3 throw errors as described. I feel it would seem logical that test 1 would also throw an error.

Test 1

Code: Select all

 Access Path:
 +-JOIN MERGEJOIN(inputs presorted) [Cost: 206, Rows: 2 (NO STATISTICS)] (PATH ID: 1)
 |  Join Cond: (a.id = b.id)
 |  Materialize at Output: a.val
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for a [Cost: 2, Rows: 2 (NO STATISTICS)] (PATH ID: 2)
 | |      Projection: public.table1_b0
 | |      Materialize: a.id
 | |      Execute on: All Nodes
 | |      Runtime Filter: (SIP1(MergeJoin): a.id)
 | +-- Inner -> STORAGE ACCESS for b [Cost: 202, Rows: 10K (NO STATISTICS)] (PATH ID: 3)
 | |      Projection: public.table2_b0
 | |      Materialize: b.id
 | |      Execute on: All Nodes
Test 2

Code: Select all

 Access Path:
 +-JOIN MERGEJOIN(inputs presorted) [Cost: 6, Rows: 1 (NO STATISTICS)] (PATH ID: 1)
 |  Join Cond: (a.id = b.id)
 |  Materialize at Output: a.val
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for a [Cost: 2, Rows: 2 (NO STATISTICS)] (PATH ID: 2)
 | |      Projection: public.table1_b0
 | |      Materialize: a.id
 | |      Execute on: All Nodes
 | |      Runtime Filter: (SIP1(MergeJoin): a.id)
 | +-- Inner -> STORAGE ACCESS for b [Cost: 2, Rows: 1 (NO STATISTICS)] (PATH ID: 3)
 | |      Projection: public.table2_b0
 | |      Materialize: b.id
 | |      Execute on: All Nodes
Test 3

Code: Select all

 Access Path:
 +-JOIN MERGEJOIN(inputs presorted) [Cost: 206, Rows: 2 (NO STATISTICS)] (PATH ID: 1)
 |  Join Cond: (a.id = b.id)
 |  Materialize at Output: a.val
 |  Execute on: All Nodes
 | +-- Outer -> STORAGE ACCESS for a [Cost: 2, Rows: 2 (NO STATISTICS)] (PATH ID: 2)
 | |      Projection: public.table1_b0
 | |      Materialize: a.id
 | |      Execute on: All Nodes
 | |      Runtime Filter: (SIP1(MergeJoin): a.id)
 | +-- Inner -> STORAGE ACCESS for b [Cost: 202, Rows: 10K (NO STATISTICS)] (PATH ID: 3)
 | |      Projection: public.table2_b0
 | |      Materialize: b.id
 | |      Execute on: All Nodes
Checkout vertica.tips for more Vertica resources.

User avatar
JimKnicely
Site Admin
Site Admin
Posts: 1825
Joined: Sat Jan 21, 2012 4:58 am
Contact:

Re: Behavior of ERROR 3149 for duplicate values

Post by JimKnicely » Thu Apr 24, 2014 2:45 pm

Maybe your test1 doesn't throw an error because there aren't dups in table2 (the target table)?
Jim Knicely

Image

Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.

NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Re: Behavior of ERROR 3149 for duplicate values

Post by NorbertKrupa » Thu Apr 24, 2014 2:54 pm

That would make sense. My thinking is that there isn't dupes in the target table in test 2, so just strange to me they chose this logic if this is the case.
Checkout vertica.tips for more Vertica resources.

Post Reply

Return to “Vertica SQL”