Hi,
I am building my first Vertica schema and am wondering if I should add foreign key constraints (RI) in the tables. Coming from an OTLP background, I am a big proponent of RI. How well does Vertica work RI and are there any pros and cons? Can I turn it on and off during data loading time? Any ideas are greatly appreciated!
RI or not RI?
Moderator: NorbertKrupa
Re: RI or not RI?
Hello Peng,
Unfortunately you cannot enforce Primary key constraint or any constraint while loading data into Vertica.
Vertica checks for Constraint violation when queries are run and not when data is loaded.
Vertica does not check for any constraints while loading data. So it is possible that you may end up with duplicate data even though you defined one of your columns as a primary key in the table.
Offcourse, there are workarounds and methods to do enforce unique data into you primary key column with Vertica.
The Copy command comes with a parameter called 'NO commit'. you can use this and load with COPY command. The transaction will not be commited, then you can check for corrupt data through the reject files, When you find the corrupted data and repair it, again run the COPY command.
And also you can use the 'ANALYZE_CONSTRAINT' function after loading the data using the no commit option.
This fuction can show you where there are constraint violation, The function can be run on database level,schema level, table level.
Hope this helped you.
Unfortunately you cannot enforce Primary key constraint or any constraint while loading data into Vertica.
Vertica checks for Constraint violation when queries are run and not when data is loaded.
Vertica does not check for any constraints while loading data. So it is possible that you may end up with duplicate data even though you defined one of your columns as a primary key in the table.
Offcourse, there are workarounds and methods to do enforce unique data into you primary key column with Vertica.
The Copy command comes with a parameter called 'NO commit'. you can use this and load with COPY command. The transaction will not be commited, then you can check for corrupt data through the reject files, When you find the corrupted data and repair it, again run the COPY command.
And also you can use the 'ANALYZE_CONSTRAINT' function after loading the data using the no commit option.
This fuction can show you where there are constraint violation, The function can be run on database level,schema level, table level.
Hope this helped you.
- JimKnicely
- Site Admin
- Posts: 1825
- Joined: Sat Jan 21, 2012 4:58 am
- Contact:
Re: RI or not RI?
There is one exception... A table's foreign key constraints are enforced during data load if there is a pre-join projection that has that table as its anchor table.
Jim Knicely
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Re: RI or not RI?
Yes, That right. It is the only exception
- JimKnicely
- Site Admin
- Posts: 1825
- Joined: Sat Jan 21, 2012 4:58 am
- Contact:
Re: RI or not RI?
Peng,
When I first started using Vertica I didn't think it was important to use FK constraints. However, I've learned two important reasons why we would want to use them (other than the obvious data integrity benefit).
When I first started using Vertica I didn't think it was important to use FK constraints. However, I've learned two important reasons why we would want to use them (other than the obvious data integrity benefit).
- 1. In Vertica, the fact table's join columns are required to have FOREIGN KEY constraints in order to participate in pre-join projections. So if we want to take advantage of the pre-join projections and the performance gains they can provide, we'll need to create FKs.
2. Creating FK constraints helps Vertica know which tables are the dimension tables and which are the fact tables. This helps Vertica decide on the best explain plan for your queries. If you are not segmenting your dimension tables, and Vertica can identify the dimension tables, it'll know that it can find that data locally on the server the portion of the query is executing in parallel on (based on data in your fact table, which is segmented).
Jim Knicely
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Re: RI or not RI?
Thank you both nnani and Jim for your advise! It is helpful to understand how the copy process works and how RI affects Vertica.
-
- GURU
- Posts: 527
- Joined: Tue Oct 22, 2013 9:36 pm
- Location: Chicago, IL
- Contact:
Re: RI or not RI?
How about loading into a temp table and using MERGE?nnani wrote:Offcourse, there are workarounds and methods to do enforce unique data into you primary key column with Vertica.
Checkout vertica.tips for more Vertica resources.