Using hash function to determine which node data gets segmented on

Moderator: NorbertKrupa

Post Reply
personwithhair
Newbie
Newbie
Posts: 1
Joined: Tue Jul 28, 2015 5:45 pm

Using hash function to determine which node data gets segmented on

Post by personwithhair » Thu Sep 17, 2015 7:01 am

I'm curious if there is a way using the hash function or something else to determine exactly which node a particular value will get segmented on? In Teradata you would use hashamp(), for example, to determine this.

Thanks!

NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Re: Using hash function to determine which node data gets segmented on

Post by NorbertKrupa » Thu Sep 17, 2015 7:22 pm

If you wanted to implement using HASH, you would have to reach out to Vertica as it's still proprietary. There are plans to make this function open source in an upcoming release. Until then, you can use a custom immutable function such as ABS(FLOOR(column/2)).

However, you would use get_projection_segments to get the ranges for each node.

Code: Select all

dbadmin=> select get_table_projections('public.customer_dimension');
                                                                                                                                                                                                                                                                                                                                  get_table_projections                                                                                                                                                                                                                                                                 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Current system K is 1.
# of Nodes: 3.
Table public.customer_dimension has 2 projections.

Projection Name: [Segmented] [Seg Cols] [# of Buddies] [Buddy Projections] [Safe] [UptoDate] [Stats]
----------------------------------------------------------------------------------------------------
public.customer_dimension_b1 [Segmented: Yes] [Seg Cols: "public.customer_dimension.customer_key"] [K: 1] [public.customer_dimension_b0] [Safe: Yes] [UptoDate: Yes] [Stats: RowCounts]
public.customer_dimension_b0 [Segmented: Yes] [Seg Cols: "public.customer_dimension.customer_key"] [K: 1] [public.customer_dimension_b1] [Safe: Yes] [UptoDate: Yes] [Stats: RowCounts]

(1 row)
And get the segments for b0:

Code: Select all

dbadmin=> select get_projection_segments('public.customer_dimension_b0');
-[ RECORD 1 ]-----------+---------------------------------------------------------------------------------------------------------------------
get_projection_segments | v_vmart_node0001|v_vmart_node0002|v_vmart_node0003
1431655764|2863311529|4294967294
4294967295|1431655765|2863311530

dbadmin=> select get_projection_segments('public.customer_dimension_b1');
                                               get_projection_segments
----------------------------------------------------------------------------------------------------------------------
 v_vmart_node0002|v_vmart_node0003|v_vmart_node0001
1431655764|2863311529|4294967294
4294967295|1431655765|2863311530
(1 row)
However, if you look at the documentation, you'll see that HASH actually produces a value in the range of 0 <= x < 2^63 (or a 64-bit) value, while the projection segments are 32-bit.
Checkout vertica.tips for more Vertica resources.

Post Reply

Return to “Vertica SQL”