Page 1 of 2

Weird behavior with RANDOMINT

Posted: Tue Sep 24, 2013 2:17 pm
by heather
Hi,

I am experiencing some really weird behavior with the RANDOMINT function.

I have a table named test_date which has two columns. A date_key which is just a number representing a date (YYYYMMDD) and a column called rn (row number).

Code: Select all

dbadmin=> select * from test_date order by 1, 2;
 date_key |  rn  
----------+------
 19200101 |    1
 19200102 |    2
 19200103 |    3
 19200104 |    4
 19200105 |    5
 19200106 |    6
 19200107 |    7
 19200108 |    8
 19200109 |    9
 19200110 |   10
...
 19991221 | 29210
 19991222 | 29211
 19991223 | 29212
 19991224 | 29213
 19991225 | 29214
 19991226 | 29215
 19991227 | 29216
 19991228 | 29217
 19991229 | 29218
 19991230 | 29219
 19991231 | 29220
 20000101 | 29221

Code: Select all

dbadmin=> select min(date_key), max(date_key), min(rn), max(rn) from test_date;
   min    |   max    | min |  max  
----------+----------+-----+-------
 19200101 | 20000101 |   1 | 29221
(1 row)
I want to select a random date_key from the table. I tried this:

Code: Select all

dbadmin=> select date_key, rn from test_date where rn = (1 + randomint(29221));
 date_key |  rn   
----------+-------
 19790214 | 21595
(1 row)
It works most of the time, but sometimes I get more than one row:

Code: Select all

dbadmin=> select date_key, rn from test_date where rn = (1 + randomint(29221));
 date_key |  rn   
----------+-------
 19240302 |  1523
 19900613 | 25732
(2 rows)
And sometimes I get no rows:

Code: Select all

dbadmin=> select date_key, rn from test_date where rn = (1 + randomint(29221));
 date_key | rn 
----------+----
(0 rows)
How is that possible?

There are no duplicates:

Code: Select all

dbadmin=> select date_key, count(*) from test_date group by date_key having count(*) > 1;
 date_key | count 
----------+-------
(0 rows)

dbadmin=> select rn, count(*) from test_date group by rn having count(*) > 1;
 rn | count 
----+-------
(0 rows)
Can someone explain what is happening?

Thanks! :? :? :?

Re: Weird behavior with RANDOMINT

Posted: Tue Sep 24, 2013 2:38 pm
by JimKnicely
Hi Heather,

Try encapsulating the the RANDOMINT function into its own SELECT:

Like this:

Code: Select all

select date_key from test_date where rn = (select (1 + randomint(29221)));
Does that help?

Re: Weird behavior with RANDOMINT

Posted: Tue Sep 24, 2013 4:04 pm
by heather
Thanks, Jim! That seemed to work... but not when it is part of an update statement.

Code: Select all

dbadmin=> select * from rand_test;
 id | rn 
----+----
  1 |   
  2 |   
  3 |   
(3 rows)

dbadmin=> update rand_test set rn = randomint(10);
 OUTPUT 
--------
      3
(1 row)

dbadmin=> select * from rand_test;
 id | rn 
----+----
  1 |  5
  2 |  2
  3 |  1
(3 rows)
That update worked, but not this one:

Code: Select all

dbadmin=> update rand_test set rn = (select randomint(10));
 OUTPUT 
--------
      3
(1 row)

dbadmin=> select * from rand_test;
 id | rn 
----+----
  1 |  9
  2 |  9
  3 |  9
(3 rows)

dbadmin=> update rand_test set rn = (select randomint(10));
 OUTPUT 
--------
      3
(1 row)

dbadmin=> select * from rand_test;
 id | rn 
----+----
  1 |  8
  2 |  8
  3 |  8
(3 rows)
I get the same "random" number for every row?

Re: Weird behavior with RANDOMINT

Posted: Tue Sep 24, 2013 4:13 pm
by id10t
Hi!

It's interesting.
I have only 2 ideas: table has mutable(not stable) default value in definitions or Vertica's reduce function work not properly.

1. Can you post a table definition? I will do a test on my system.
2. What is Vertica version?
3. Can you post PROFILE somewhere (pastebin for example)? I want to check query parallelism.

PS
If query parallelized so 2 answers probably comes from separate streams("reduce" didn't filter one answer). And if "reduce" work not properly so it can be reason for "No answer".

Re: Weird behavior with RANDOMINT

Posted: Tue Sep 24, 2013 4:14 pm
by JimKnicely
Heather,

That sub query will only get evaluated once (first) for every row you are updating. Try correlating it back to the table you are updating.

Like this:

Code: Select all

update rand_test a set rn = (select randomint(10) from rand_test b where b.id = a.id);
Does that work for you? That should cause the sub query to be evaluated for each row you are updating.

Re: Weird behavior with RANDOMINT

Posted: Tue Sep 24, 2013 4:19 pm
by heather
sKwa,

Here are the details:

Code: Select all

dbadmin=> select version();
              version               
------------------------------------
 Vertica Analytic Database v6.1.2-0
(1 row)

Code: Select all

dbadmin=> \dt rand_test;
                 List of tables
 Schema |   Name    | Kind  |  Owner  | Comment 
--------+-----------+-------+---------+---------
 public | rand_test | table | dbadmin | 
(1 row)

Code: Select all

dbadmin=> \d rand_test;
                                  List of Fields by Tables
 Schema |   Table   | Column | Type | Size | Default | Not Null | Primary Key | Foreign Key 
--------+-----------+--------+------+------+---------+----------+-------------+-------------
 public | rand_test | id     | int  |    8 |         | f        | f           | 
 public | rand_test | rn     | int  |    8 |         | f        | f           | 
(2 rows)

Code: Select all

dbadmin=> profile update rand_test a set rn = (select randomint(10));
NOTICE 4788:  Statement is being profiled
HINT:  Select * from v_monitor.execution_engine_profiles where transaction_id=45035996273714273 and statement_id=41;
NOTICE 3557:  Initiator memory for query: [on pool general: 293851 KB, minimum: 293851 KB]
NOTICE 5077:  Total memory required by query: [293851 KB]
 OUTPUT 
--------
      3
(1 row)

Re: Weird behavior with RANDOMINT

Posted: Tue Sep 24, 2013 4:23 pm
by heather
Jim,

I just saw your reply. That did work! But it seems weird to me.

Code: Select all

dbadmin=> update rand_test a set rn = (select randomint(10) from rand_test b where b.id = a.id);
 OUTPUT 
--------
      3
(1 row)

dbadmin=> select * from rand_test;
 id | rn 
----+----
  1 |  0
  2 |  8
  3 |  9
(3 rows)
Hmm. Is there some sort of hint we can pass in the sub-query to say "please evaluate me for every row". I think that exists in Oracle.

Thanks for all your help guys. This has been driving me crazy for days!!!