Page 1 of 1

How to handle duplicates in vertica ?

Posted: Sat Jun 01, 2013 12:00 am
by varuna.bhat
Hello,

Vertica will allow to load the same data file into table any number of time and it will create duplicate rows.

Is there any way to handle this duplicates ??

While data loading if some rows goes to error then what is the best way to handle it?

Thanks in advance.

Re: How to handle duplicates in vertica ?

Posted: Mon Jun 03, 2013 8:03 am
by nnani
Hello Varuna,

Welcome to VerticaForums.

In Vertica the constraints cannot be enforced as you do with Oracle. However there are some methodologies forllowed to get rid of duplicates.
Please go through the function ANALYZE CONSTRAINT. Maybe this can help you a bit.
Rest, this topic will definitely help
http://www.vertica-forums.com/viewtopic ... ates#p2695

The second half of your question:
While loading If there is any rejected data and you don't know the reason for it. You can check the rejected data file to see what data is rejected.
Notes
When loading data with the COPY statement, COPY considers the following data invalid:
 Missing columns (too few columns in an input line).
 Extra columns (too many columns in an input line).
 Empty columns for INTEGER or DATE/TIME data types. COPY does not use the default data values defined by the CREATE TABLE command, unless you do not supply a column option as part of the COPY statement.
 Incorrect representation of data type. For example, non-numeric data in an INTEGER column is invalid.
This type of data is considerd to be rejected data by Vertica.
The best way to handle it: Use No commit option while using COPY command. This parameter will not commit your data when it finishes the COPY statement. You can check your rejected data file for any records, If you find it. Correct the data and load (COPY) again.

Hope, this helped you.

Re: How to handle duplicates in vertica ?

Posted: Tue Jun 04, 2013 11:58 pm
by varuna.bhat
Thanks Nnani for your reply.