The WIDTH_BUCKET Function

Moderator: NorbertKrupa

Post Reply
User avatar
JimKnicely
Site Admin
Site Admin
Posts: 1825
Joined: Sat Jan 21, 2012 4:58 am
Contact:

The WIDTH_BUCKET Function

Post by JimKnicely » Fri Aug 10, 2012 1:20 pm

The WIDTH_BUCKET function constructs equal width histograms, in which the histogram range is divided into intervals (buckets) of identical sizes.

The syntax of the function is:
  • WIDTH_BUCKET ( expression, hist_min, hist_max, bucket_count )
Example:

Pretend we have a table named TV_SHOW that stores TV show names and their corresponding average viewer age:

Code: Select all

dbadmin=> SELECT name, age_grp
dbadmin->   FROM tv_show
dbadmin->  ORDER BY name;
         name         | age_grp
----------------------+---------
60 Minutes            |     100
American Idol         |      25
Andy Griffith Show    |      91
Arrested Development  |      32
Blue Bloods           |      63
CSI: NY               |      58
Dallas                |      33
Dancing with DBAs     |      42
Family Guy            |      18
Golden Girls          |      63
Magnum P.I.           |      60
Modern Family         |      32
Murder She Wrote      |      53
SpongeBob SquarePants |      11
Star Trek             |      28
Teen Wolf             |      16
Wheel of Fortune      |      42
Wiggles               |       7
(18 rows)
Say we want to group the viewers into the following ten age groups:

Code: Select all

  AGE		 AGE GROUP
-------     ---------
00 – 09         1
10 – 19         2
20 – 29         3
30 – 39         4
40 – 49         5
50 – 59         6
60 – 69         7
70 – 79         8
80 – 89         9
90 - 99        10
We can do this easily with the WIDTH_BUCKET function:

Code: Select all

dbadmin=> SELECT name, age_grp, width_bucket(age_grp, 0, 99, 10)
dbadmin->   FROM tv_show
dbadmin->  ORDER BY name;
         name          | age_grp | width_bucket
-----------------------+---------+--------------
60 Minutes            |     100 |           11
American Idol         |      25 |            3
Andy Griffith Show    |      91 |           10
Arrested Development  |      32 |            4
Blue Bloods           |      63 |            7
CSI: NY               |      58 |            6
Dallas                |      33 |            4
Dancing with DBAs     |      42 |            5
Family Guy            |      18 |            2
Golden Girls          |      63 |            7
Magnum P.I.           |      60 |            7
Modern Family         |      32 |            4
Murder She Wrote      |      53 |            6
SpongeBob SquarePants |      11 |            2
Star Trek             |      28 |            3
Teen Wolf             |      16 |            2
Wheel of Fortune      |      42 |            5
Wiggles               |       7 |            1
(18 rows)
Note: The WIDTH_BUCKET function will create an over flow bucket automatically, hence “60 Minutes” falling into group 11…

Have fun!
Jim Knicely

Image

Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.

Post Reply

Return to “Vertica Tips, Lessons and Examples”