Hi all,
I am curious about other opinions as to which is better in Vertica... a composite PK or a surrogate key. I prefer surrogate keys, but I have some team members who disagree stating that they are to difficult to use when moving data between databases/comparing data between database, etc.
Please let me know which you prefer based on your experience
-Brett
Composite Key vs. Surrogate Key
Moderator: NorbertKrupa
- JimKnicely
- Site Admin
- Posts: 1825
- Joined: Sat Jan 21, 2012 4:58 am
- Contact:
Re: Composite Key vs. Surrogate Key
IMHO, I always use surrogate keys that are integers. Joining on INT values is orders of magnitude faster than joining on a crazy number of varchar columns that I've seen defined in a PK in client database tables.
Jim Knicely
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Re: Composite Key vs. Surrogate Key
Hi!
[DELETED]
[DELETED]
Last edited by id10t on Wed May 06, 2015 4:32 pm, edited 1 time in total.
-
- GURU
- Posts: 527
- Joined: Tue Oct 22, 2013 9:36 pm
- Location: Chicago, IL
- Contact:
Re: Composite Key vs. Surrogate Key
If you have a large enough data set, it will make a difference. Joining on a single int vs. joining on multiple mixed data types will make a difference.sKwa wrote:Hi!
Once at AT&T I tested SK vs NK (surrogate vs natural key), where NK were 10 fields and didn't find any advantage from point of view of performance...
Checkout vertica.tips for more Vertica resources.
Re: Composite Key vs. Surrogate Key
Hi!
[DELETED]
[DELETED]
Last edited by id10t on Wed May 06, 2015 4:31 pm, edited 1 time in total.
Re: Composite Key vs. Surrogate Key
@sKwa
I highly doubt that there is no performance advantage at query time. You may be able to mitigate some of the performance degradation (perhaps grouping fields in the projection, for example) but even still there would be some performance difference in favor of a single integer key join. Be realistic, you're joining 10 columns... in a columnar database. It's impossible to state there is no performance difference assuming both tests were fully optimized.
It's very situational, though. Generating a surrogate key and correlating it will slow down your loading ability. Also, you may benefit less by using a surrogate key since you'd need to sort on it instead of possibly more realistic predicate fields. I do not prefer using a surrogate key over a natural key generally speaking, but in some cases it might be preferred. Some people are religious about using it, though.
On a side note, I'm pretty sure @norbertk did not imply that AT&T has small data sets, that's a straw man and really doesn't prove your point at all.
I highly doubt that there is no performance advantage at query time. You may be able to mitigate some of the performance degradation (perhaps grouping fields in the projection, for example) but even still there would be some performance difference in favor of a single integer key join. Be realistic, you're joining 10 columns... in a columnar database. It's impossible to state there is no performance difference assuming both tests were fully optimized.
It's very situational, though. Generating a surrogate key and correlating it will slow down your loading ability. Also, you may benefit less by using a surrogate key since you'd need to sort on it instead of possibly more realistic predicate fields. I do not prefer using a surrogate key over a natural key generally speaking, but in some cases it might be preferred. Some people are religious about using it, though.
On a side note, I'm pretty sure @norbertk did not imply that AT&T has small data sets, that's a straw man and really doesn't prove your point at all.
Re: Composite Key vs. Surrogate Key
Hi!
[DELETED]
[DELETED]
Last edited by id10t on Wed May 06, 2015 4:31 pm, edited 1 time in total.