Converting Columns to Rows

Moderator: NorbertKrupa

Post Reply
User avatar
JimKnicely
Site Admin
Site Admin
Posts: 1825
Joined: Sat Jan 21, 2012 4:58 am
Contact:

Converting Columns to Rows

Post by JimKnicely » Fri Feb 15, 2013 5:42 pm

Hi!

Say we have the following data in a Vertica database table named TEST:

Code: Select all

dbadmin=> SELECT * FROM test;
 name | c1 | c2 | c3
------+----+----+----
 PENG |  1 |  5 |  7
 TOM  |  5 |  3 |  2
 JIM  |  2 |  4 |  6
(3 rows)
How would we calculate the average value of columns C1, C2 and C3 for each row in the table? Maybe something like the following query will help…

Code: Select all

dbadmin=> SELECT AVG(C1, C2, C3)
dbadmin->   FROM test
dbadmin->  GROUP
dbadmin->     BY name;
ERROR 3457:  Function AVG(int, int, int) does not exist, or permission is denied for AVG(int, int, int)
HINT:  No function matches the given name and argument types. You may need to add explicit type casts
Oops, the AVG() function can only accept one argument as it works to average row values, not multiple column values.

So we need to somehow convert the columns C1, C2 and C3 into rows so that the AVG() function can do its thing!

Oracle 11g has a great feature in the UNPIVOT operator that can be used to convert column-based data into separate rows.

Example from Oracle:

Code: Select all

SQL> SELECT * FROM test;

NAME          C1         C2         C3
----- ---------- ---------- ----------
JIM            2          4          6
PENG           1          5          7
TOM            5          3          2

3 rows selected.

SQL> SELECT name,
  2         AVG(value)
  3    FROM test
  4     UNPIVOT
  5       EXCLUDE NULLS
  6         (VALUE FOR PROPERTY IN (c1, c2, c3))
  7   GROUP
  8      BY name
  9   ORDER
 10      BY name;

NAME  AVG(VALUE)
----- ----------
JIM            4
PENG  4.33333333
TOM   3.33333333

3 rows selected.
But how can we pull this feat off in Vertica where the UNPIVOT operator is non-existent? Well, it’s a little more tedious, but it is possible.

Example in Vertica:

Code: Select all

dbadmin=> SELECT * FROM test;
 name | c1 | c2 | c3
------+----+----+----
 PENG |  1 |  5 |  7
 TOM  |  5 |  3 |  2
 JIM  |  2 |  4 |  6
(3 rows)

dbadmin=> SELECT foo3.name,
dbadmin->        foo2.c_avg
dbadmin->   FROM (SELECT name, rn, AVG(c) c_avg
dbadmin(>           FROM (SELECT name, row_number() over (ORDER BY c1, c2, c3) rn, c1 c FROM test
dbadmin(>                  UNION ALL
dbadmin(>                 SELECT name, row_number() over (ORDER BY c1, c2, c3) rn, c2 c FROM test
dbadmin(>                  UNION ALL
dbadmin(>                 SELECT name, row_number() over (ORDER BY c1, c2, c3) rn, c3 c FROM test) foo
dbadmin(>           GROUP BY name,
dbadmin(>                    rn) foo2
dbadmin->   JOIN (SELECT name, row_number() over (ORDER BY c1, c2, c3) rn, c1, c2, c3 FROM test) foo3
dbadmin->     ON foo3.name = foo2.name
dbadmin->    AND foo3.rn = foo2.rn
dbadmin-> ORDER
dbadmin->     BY foo3.name,
dbadmin->        foo3.rn;
 name |      c_avg
------+------------------
 JIM  |                4
 PENG | 4.33333333333333
 TOM  | 3.33333333333333
(3 rows)
The core of the column to row conversion is handled by the virtual table aliased as FOO. Each SELECT represents one of the columns to be converted to a row, and they are glued together via the UNION ALL statements. If we add a new column to the TEST table and want it included in the average calculation, we’ll have to add another SELECT to the FOO table.

Instead of retyping that long SQL command every time the data changes and we need to recalculate the average, we can create a database VIEW.

Code: Select all

dbadmin=> CREATE OR REPLACE VIEW test_col_avg_vw AS
dbadmin->   SELECT foo3.name,
dbadmin->          foo3.c1,
dbadmin->          foo3.c2,
dbadmin->          foo3.c3,
dbadmin->          foo2.c_avg AS col_avg
dbadmin->     FROM (SELECT name, rn, AVG(c) c_avg
dbadmin(>             FROM (SELECT name, row_number() over (ORDER BY c1, c2, c3) rn, c1 c FROM test
dbadmin(>                    UNION ALL
dbadmin(>                   SELECT name, row_number() over (ORDER BY c1, c2, c3) rn, c2 c FROM test
dbadmin(>                    UNION ALL
dbadmin(>                   SELECT name, row_number() over (ORDER BY c1, c2, c3) rn, c3 c FROM test) foo
dbadmin(>             GROUP BY name,
dbadmin(>                      rn) foo2
dbadmin->     JOIN (SELECT name, row_number() over (ORDER BY c1, c2, c3) rn, c1, c2, c3 FROM test) foo3
dbadmin->       ON foo3.name = foo2.name
dbadmin->      AND foo3.rn = foo2.rn
dbadmin->   ORDER
dbadmin->       BY foo3.name,
dbadmin->          foo3.rn;
CREATE VIEW
Querying the VIEW is a lot simpler!

Code: Select all

dbadmin=> INSERT INTO test VALUES ('PATRICK', 4, 9, 3);
 OUTPUT
--------
      1
(1 row)

dbadmin=> SELECT * FROM test_col_avg_vw;
  name   | c1 | c2 | c3 |     col_avg
---------+----+----+----+------------------
 JIM     |  2 |  4 |  6 |                4
 PATRICK |  4 |  9 |  3 | 5.33333333333333
 PENG    |  1 |  5 |  7 | 4.33333333333333
 TOM     |  5 |  3 |  2 | 3.33333333333333
(4 rows)
Have fun!
Jim Knicely

Image

Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.

id10t
GURU
GURU
Posts: 732
Joined: Mon Apr 16, 2012 2:44 pm

Re: Converting Columns to Rows

Post by id10t » Fri Oct 04, 2013 2:55 pm

Hi Jim!

Here I found one more SQL solution - with cross join.

Post Reply

Return to “Vertica Tips, Lessons and Examples”