Selecting a Random Set of Rows from a Table

Moderator: NorbertKrupa

User avatar
Julie
Master
Master
Posts: 221
Joined: Thu Apr 19, 2012 9:29 pm

Selecting a Random Set of Rows from a Table

Post by Julie » Tue Jun 12, 2012 1:00 pm

Hi guys,

Is there a way that I can select a random set of rows from a table? For instance, if I have a million rows in a table I want to create another table that contains a random subset of 100,000 rows from the original table. I want it to be random so that I can keep repopulating the smaller table when I want a new set of data.

Thank you!
Thanks,
Juliette

id10t
GURU
GURU
Posts: 732
Joined: Mon Apr 16, 2012 2:44 pm

Re: Selecting a Random Set of Rows from a Table

Post by id10t » Tue Jun 12, 2012 4:13 pm

Hi Julie!

I can't promise exact amount of rows in random sample, but some banal solution I can give for your problem:
  • Table:

    Code: Select all

    CREATE TABLE public.RandomSample
    (
        id  IDENTITY ,
        rand float
    );
    
  • Count rows:

    Code: Select all

    dbadmin=> select count(*) from RandomSample ;
      count   
    ----------
     10000000
    (1 row)
    
  • Query

    Code: Select all

    dbadmin=> select * from RandomSample where RANDOMINT(2) limit 25;
     id |      rand       
    ----+-----------------
      1 |  0.435220889408
      2 |  0.526619048598
      7 |  0.503683776469
      8 |  0.156843165799
      9 |  0.287958949609
     11 |  0.729640057579
     12 |  0.995287269836
     13 |  0.903783922664
     14 | 0.0783629890101
     15 |  0.842216552691
     21 |  0.182495597212
     25 |  0.186746456018
     26 |  0.143420991338
     29 |  0.349558490354
     30 |  0.741160671611
     33 |   0.52479647246
     34 |  0.384557384151
     35 |  0.460069846527
     37 |  0.787678300055
     40 |  0.615458029295
     41 |  0.648441913875
     45 |  0.866885151712
     47 |  0.808360821028
     48 |  0.148068713781
     49 |  0.426903519349
    (25 rows)
    
  • Filter: where RANDOMINT(2) equals to where RANDOMINT(2) = 1 or where RANDOMINT(2)::BOOLEAN = TRUE
  • Sample: LIMIT 25 promise to us that output will be equals or less than this number
You can create more complicated random FILTER, i.e:

Code: Select all

dbadmin=> select * from RandomSample where RANDOMINT(2) and (RandomSample.id) % 3 < 1 limit 25;
 id  |      rand       
-----+-----------------
   3 |  0.806599489745
   6 |  0.861293215119
   9 |  0.287958949609
  21 |  0.182495597212
  27 |  0.991428367272
  33 |   0.52479647246
  36 |  0.306133418106
  39 | 0.0674794158243
  45 |  0.866885151712
  48 |  0.148068713781
  51 |  0.648667960266
  60 |  0.444064909126
  66 |  0.947957454543
  69 |  0.444615772056
  75 |  0.510743846082
  81 |  0.836552352283
  87 |  0.624822871063
  90 |   0.95282113079
  99 | 0.0967488669246
 102 |  0.710242643209
 105 |  0.893509417363
 111 | 0.0513069083741
 114 |  0.363351906321
 117 |  0.635888079515
 120 |   0.58747541997
(25 rows)

doug_harmon
Beginner
Beginner
Posts: 36
Joined: Fri Feb 17, 2012 6:09 pm
Contact:

Re: Selecting a Random Set of Rows from a Table

Post by doug_harmon » Tue Jun 12, 2012 4:22 pm

Try this:

Code: Select all

select A.*
from (
select random() as RandomNumber, *
from <TableNameGoesHere>  ) as A
order by RandomNumber
limit 100000;

id10t
GURU
GURU
Posts: 732
Joined: Mon Apr 16, 2012 2:44 pm

Re: Selecting a Random Set of Rows from a Table

Post by id10t » Tue Jun 12, 2012 9:02 pm

Just4Fun
(@doug_harmon gave good solution)

Over 1mil rows function randomint(1000000) % 10 = 7 gives better random distribution than sort by random() and returns about 100K rows. :D

Code: Select all

dbadmin=> \d floats 
                                 List of Fields by Tables
 Schema | Table  | Column | Type  | Size | Default | Not Null | Primary Key | Foreign Key 
--------+--------+--------+-------+------+---------+----------+-------------+-------------
 public | floats | id     | int   |    8 |         | t        | f           | 
 public | floats | f      | float |    8 |         | f        | f           | 
(2 rows)

Code: Select all

dbadmin=> select count(*) from floats;
  count  
---------
 1000000
(1 row)

Code: Select all

dbadmin=> select count(*) from floats where randomint((select count(*) from floats))%10 = 7;
 count  
--------
 100178
(1 row)

Code: Select all

dbadmin=> select count(*) from floats where randomint((select count(*) from floats))%10 = 7;
 count  
--------
 100117
(1 row)

Code: Select all

dbadmin=> select count(*) from floats where randomint((select count(*) from floats))%10 = 7;
 count  
--------
 100136
(1 row)

User avatar
Julie
Master
Master
Posts: 221
Joined: Thu Apr 19, 2012 9:29 pm

Re: Selecting a Random Set of Rows from a Table

Post by Julie » Wed Jun 13, 2012 7:33 pm

sKwa, thanks for your response! Although, maybe I didn't make myself clear. :oops:

Say I have a table TABLE1 with these values:
  • Red
    Blue
    Green
    Black
    Purple
    Orange
I want to insert a random subset of that data into a new table called TABLE2.

For instance:
  • Green
    Purple
    Orange
Is there an INSERT statement I can run to randomly select three of the colors from TABLE1 and insert them into TABLE2? I want to then truncate TABLE2 and re-run the INSERT statement which should select another random three colors.
  • Blue
    Black
    Orange
If I run the INSERT again, TABLE2 might contain these rows:

I hope this makes more sense. Again, thanks for your help!
Thanks,
Juliette

id10t
GURU
GURU
Posts: 732
Joined: Mon Apr 16, 2012 2:44 pm

Re: Selecting a Random Set of Rows from a Table

Post by id10t » Wed Jun 13, 2012 9:47 pm

Hi Julie!

I will try:
  • Table:

    Code: Select all

    test_db=> create table Colors (colour varchar(10));
    CREATE TABLE
    
  • Data:

    Code: Select all

    test_db=> copy Colors from stdin direct;
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> Green
    >> Blue
    >> Black
    >> Orange     
    >> Magenta
    >> Yellow
    >> #FFCCBB
    >> White
    >> #DCDCDC
    >> \.
  • Table of samples:

    Code: Select all

    create table if not exists ColorsRand as select colour from Colors where RANDOMINT(2);
    CREATE TABLE
  • Show result:

    Code: Select all

    test_db=> select * from Colors left outer join ColorsRand on Colors.colour = ColorsRand.colour;
     colour  | colour  
    ---------+---------
     #DCDCDC | 
     #FFCCBB | #FFCCBB
     Black   | Black
     Blue    | Blue
     Green   | 
     Magenta | 
     Orange  | 
     White   | 
     Yellow  | 
    (9 rows)
    
  • One more time:

    Code: Select all

    truncate table ColorsRand; 
    insert into ColorsRand select colour from Colors where RANDOMINT(2); 
    select * from Colors left outer join ColorsRand on Colors.colour = ColorsRand.colour;
    

    Code: Select all

    TRUNCATE TABLE
     OUTPUT 
    --------
          5
    (1 row)
    
     colour  | colour  
    ---------+---------
     #DCDCDC | 
     #FFCCBB | #FFCCBB
     Black   | Black
     Blue    | Blue
     Green   | Green
     Magenta | Magenta
     Orange  | 
     White   | 
     Yellow  | 
    (9 rows)
    
Last edited by id10t on Wed Jun 13, 2012 10:37 pm, edited 2 times in total.

id10t
GURU
GURU
Posts: 732
Joined: Mon Apr 16, 2012 2:44 pm

Re: Selecting a Random Set of Rows from a Table

Post by id10t » Wed Jun 13, 2012 10:01 pm

SELECT INTO

Code: Select all

test_db=> drop table ColorsRand ;
DROP TABLE
test_db=> select * into ColorsRand from Colors where RANDOMINT(2);
CREATE TABLE
test_db=> select * from Colors left outer join ColorsRand on Colors.colour = ColorsRand.colour;
 colour  | colour  
---------+---------
 #DCDCDC | 
 #FFCCBB | 
 Black   | 
 Blue    | Blue
 Green   | 
 Magenta | Magenta
 Orange  | Orange
 White   | White
 Yellow  | Yellow
(9 rows)
But in second time it will not work:

Code: Select all

test_db=> select * into ColorsRand from Colors where RANDOMINT(2);
ROLLBACK 4213:  Object "ColorsRand" already exists
So the same way: TRUNCATE -> INSERT ... RANDOM FILTER

Post Reply

Return to “Vertica SQL”