Page 1 of 3

Help with Code

Posted: Sun Jul 19, 2015 9:25 pm
by Vertica_grm
Hey Jim,

could you please guide me here:

I have a below table.

UserID Timestamp Event
1234 4/17/2014 22:00 A
1234 5/1/2014 21:46 A
1234 5/8/2014 18:33 A
1234 5/8/2014 18:49 A
1234 5/12/2014 21:19 B
1234 5/12/2014 21:19 B
1234 5/8/2014 21:19 A
1234 5/12/2014 21:20 B
1234 5/12/2014 21:21 B

i would like to find: For Every timestamp of event B find the count of A which happened before B's timestamp and after B which might have happened perviously. E.g. for Second occurence of B count should be 0 because there was no A between B and the previous timestamp of B.

Final output should be like this:

UserID Timestamp Event Count
1234 4/17/2014 22:00 A -
1234 5/1/2014 21:46 A -
1234 5/8/2014 18:33 A -
1234 5/8/2014 18:49 A -
1234 5/12/2014 21:19 B 4
1234 5/12/2014 21:19 B 0
1234 5/8/2014 21:19 A -
1234 5/12/2014 21:20 B 1
1234 5/12/2014 21:21 B 0

Re: Help with Code

Posted: Mon Jul 20, 2015 3:21 am
by NorbertKrupa
Can you show us what you tried?

Re: Help with Code

Posted: Mon Jul 20, 2015 3:46 am
by JimKnicely
There are probably a bunch of ways to solve this. Here is my first attempt :)

Code: Select all

dbadmin=> SELECT * FROM test ORDER BY pk;
 pk | user_id |     time_stamp      | event
----+---------+---------------------+-------
  1 |    1234 | 2014-04-17 22:00:00 | A
  2 |    1234 | 2014-05-01 21:46:00 | A
  3 |    1234 | 2014-05-08 18:33:00 | A
  4 |    1234 | 2014-05-08 18:49:00 | A
  5 |    1234 | 2014-05-12 21:19:00 | B
  6 |    1234 | 2014-05-12 21:19:00 | B
  7 |    1234 | 2014-05-08 21:19:00 | A
  8 |    1234 | 2014-05-12 21:20:00 | B
  9 |    1234 | 2014-05-12 21:21:00 | B
(9 rows)
Note that I added a pk column to keep your ordering since your data is not sorted.

Code: Select all

dbadmin=> SELECT user_id AS "UserID", time_stamp AS "Timestamp", event AS "Event",
dbadmin->        CASE WHEN event = 'B' THEN LAG(for_lag) OVER (ORDER BY pk)::VARCHAR ELSE '-' END AS "Count"
dbadmin->   FROM(SELECT pk, user_id, time_stamp, event, cce,
dbadmin(>               SUM(ctr) OVER (PARTITION BY user_id, cce ORDER BY pk) for_lag
dbadmin(>          FROM (SELECT pk, user_id, time_stamp, event,
dbadmin(>                       CASE WHEN event = 'A' THEN 1 ELSE 0 END AS ctr,
dbadmin(>                       CONDITIONAL_CHANGE_EVENT(event) OVER (ORDER BY pk) AS cce
dbadmin(>                  FROM test) foo) foo2
dbadmin->  ORDER BY pk;
 UserID |      Timestamp      | Event | Count
--------+---------------------+-------+-------
   1234 | 2014-04-17 22:00:00 | A     | -
   1234 | 2014-05-01 21:46:00 | A     | -
   1234 | 2014-05-08 18:33:00 | A     | -
   1234 | 2014-05-08 18:49:00 | A     | -
   1234 | 2014-05-12 21:19:00 | B     | 4
   1234 | 2014-05-12 21:19:00 | B     | 0
   1234 | 2014-05-08 21:19:00 | A     | -
   1234 | 2014-05-12 21:20:00 | B     | 1
   1234 | 2014-05-12 21:21:00 | B     | 0
(9 rows)
This may get you thinking about alternatives...

Re: Help with Code

Posted: Mon Jul 20, 2015 7:26 pm
by Vertica_grm
Thanks Jim,Really appreciate it.

That worked.

Re: Help with Code

Posted: Wed Jul 22, 2015 9:55 pm
by Vertica_grm
If I have result set from a subquery as
UserID ColumnA Column B Percent
1 1 5 0.2
2 2 5 0.4
3 1 4 0.25
7 0 1 0
8 0 1 0


How do I add another Column (5th one below) which is basically SUM(ColumnA)/SUM(COLUMNB)

UserID ColumnA Column B % Group AVG Optimum
1 2 5 0.4 0.3125 Yes
2 2 5 0.4 0.3125 Yes
3 1 4 0.25 0.3125 Yes
7 0 1 0 0.3125 No
8 0 1 0 0.3125 No

Issue: I need first 4 column in the final result. since they will be in select statement, I will need to groupby by those column also which messes up my result on SUM function.

VSQL isn't terribly helpful in temp tables or variables. What would be the best way out?

Re: Help with Code

Posted: Thu Jul 23, 2015 12:22 am
by JimKnicely
Hi,

I think the analytic function SUM() is what you are looking for. You should check out the docs and read about analytic functions. It's one of the areas where Vertica really, really shines!

http://my.vertica.com/docs/7.1.x/HTML/i ... s%7C_____0

Below are examples using your data sets:

Code: Select all

dbadmin=> SELECT userid, column_a, column_b,
dbadmin->        ROUND(column_a / column_b, 4.0) "%",
dbadmin->        ROUND(SUM(column_a) over () / SUM(column_b) over (), 4.0) "Group AVG"
dbadmin->   FROM test1 ORDER BY userid;

 userid | column_a | column_b |  %   | Group AVG
--------+----------+----------+------+-----------
      1 |        1 |        5 |  0.2 |      0.25
      2 |        2 |        5 |  0.4 |      0.25
      3 |        1 |        4 | 0.25 |      0.25
      7 |        0 |        1 |    0 |      0.25
      8 |        0 |        1 |    0 |      0.25
(5 rows)

Code: Select all

dbadmin=>
dbadmin=> SELECT userid, column_a, column_b,
dbadmin->        ROUND(column_a / column_b, 4.0) "%",
dbadmin->        ROUND(SUM(column_a) over () / SUM(column_b) over (), 4.0) "Group AVG"
dbadmin->   FROM test2 ORDER BY userid;
 userid | column_a | column_b |  %   | Group AVG
--------+----------+----------+------+-----------
      1 |        2 |        5 |  0.4 |    0.3125
      2 |        2 |        5 |  0.4 |    0.3125
      3 |        1 |        4 | 0.25 |    0.3125
      7 |        0 |        1 |    0 |    0.3125
      8 |        0 |        1 |    0 |    0.3125
(5 rows)

Re: Help with Code

Posted: Thu Jul 23, 2015 3:05 pm
by Vertica_grm
Thank you JIm. You're the best! :D :D :D