Vertica joins on Integer or VARCHAR fields , which one is good for performance ?

Moderator: NorbertKrupa

Post Reply
son2
Newbie
Newbie
Posts: 3
Joined: Fri Feb 19, 2016 1:46 pm

Vertica joins on Integer or VARCHAR fields , which one is good for performance ?

Post by son2 » Tue Mar 08, 2016 11:15 am

While joining two tables in vertica, in terms of performance wise, the query will perform good while joining on two integer fields or joining on VARCHAR fields.
As the data are stored in Vertica in encrypted format, so while performing join how the data is being read from Vertica. If the tables are joined on integer fileds then how the data is being read from Vertica or IF the join is on varchar fields, then how the data is being read from Vertica.

Which one will improve performance ?

NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Re: Vertica joins on Integer or VARCHAR fields , which one is good for performance ?

Post by NorbertKrupa » Tue Mar 08, 2016 1:37 pm

At the machine level, there are instructions which can perform integer comparisons in one cycle and are usually stored in smaller registers.

With strings (or VARCHAR) which can be varying lengths, and characters, you have to compare each character of the string as well as NULLs. Since these characters are translated into integers at the machine level, the machine is now doing a loop over the entire string.

Some databases do different optimizations to improve performance on string comparisons. As a rule of thumb, a JOIN on an integer is usually better.

I always say, test it yourself.

TLDR: An integer comparison at a machine level will always be faster than a string comparison
Checkout vertica.tips for more Vertica resources.

User avatar
JimKnicely
Site Admin
Site Admin
Posts: 1825
Joined: Sat Jan 21, 2012 4:58 am
Contact:

Re: Vertica joins on Integer or VARCHAR fields , which one is good for performance ?

Post by JimKnicely » Tue Mar 08, 2016 10:21 pm

Hi,

I agree with Norbert in that typically joining on columns having the INT datatype is better than joining on columns that have a VARCHAR datatype. Although when tested at client sites, I typically only see a performance degradation with VARCHARs when they are significantly larger than 8 bytes in size. I'm guessing because integers in Vertica use 8 bytes ;)
Jim Knicely

Image

Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.

son2
Newbie
Newbie
Posts: 3
Joined: Fri Feb 19, 2016 1:46 pm

Re: Vertica joins on Integer or VARCHAR fields , which one is good for performance ?

Post by son2 » Thu Mar 10, 2016 3:25 am

As the vertica uses Hash or Mege join while joing two tables; so even if we join two tables in vertica with Joining fields as varchar field, It will perform fast to retrieve the records ?

NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Re: Vertica joins on Integer or VARCHAR fields , which one is good for performance ?

Post by NorbertKrupa » Thu Mar 10, 2016 8:53 pm

Please read my post on the difference between merge and hash joins.
Checkout vertica.tips for more Vertica resources.

Post Reply

Return to “General”