Page 1 of 1

Using hash function to determine which node data gets segmented on

Posted: Thu Sep 17, 2015 7:01 am
by personwithhair
I'm curious if there is a way using the hash function or something else to determine exactly which node a particular value will get segmented on? In Teradata you would use hashamp(), for example, to determine this.

Thanks!

Re: Using hash function to determine which node data gets segmented on

Posted: Thu Sep 17, 2015 7:22 pm
by NorbertKrupa
If you wanted to implement using HASH, you would have to reach out to Vertica as it's still proprietary. There are plans to make this function open source in an upcoming release. Until then, you can use a custom immutable function such as ABS(FLOOR(column/2)).

However, you would use get_projection_segments to get the ranges for each node.

Code: Select all

dbadmin=> select get_table_projections('public.customer_dimension');
                                                                                                                                                                                                                                                                                                                                  get_table_projections                                                                                                                                                                                                                                                                 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Current system K is 1.
# of Nodes: 3.
Table public.customer_dimension has 2 projections.

Projection Name: [Segmented] [Seg Cols] [# of Buddies] [Buddy Projections] [Safe] [UptoDate] [Stats]
----------------------------------------------------------------------------------------------------
public.customer_dimension_b1 [Segmented: Yes] [Seg Cols: "public.customer_dimension.customer_key"] [K: 1] [public.customer_dimension_b0] [Safe: Yes] [UptoDate: Yes] [Stats: RowCounts]
public.customer_dimension_b0 [Segmented: Yes] [Seg Cols: "public.customer_dimension.customer_key"] [K: 1] [public.customer_dimension_b1] [Safe: Yes] [UptoDate: Yes] [Stats: RowCounts]

(1 row)
And get the segments for b0:

Code: Select all

dbadmin=> select get_projection_segments('public.customer_dimension_b0');
-[ RECORD 1 ]-----------+---------------------------------------------------------------------------------------------------------------------
get_projection_segments | v_vmart_node0001|v_vmart_node0002|v_vmart_node0003
1431655764|2863311529|4294967294
4294967295|1431655765|2863311530

dbadmin=> select get_projection_segments('public.customer_dimension_b1');
                                               get_projection_segments
----------------------------------------------------------------------------------------------------------------------
 v_vmart_node0002|v_vmart_node0003|v_vmart_node0001
1431655764|2863311529|4294967294
4294967295|1431655765|2863311530
(1 row)
However, if you look at the documentation, you'll see that HASH actually produces a value in the range of 0 <= x < 2^63 (or a 64-bit) value, while the projection segments are 32-bit.