Page 1 of 1
Counting Based on Date between Two Dates
Posted: Fri Jul 24, 2015 9:56 am
by Razgriz
Hello,
I have a table with columns ACCOUNT_ID, ACTIVE_DATE and INACTIVE_DATE.
Code: Select all
ACCOUNT_ID| ACTIVE_DATE| INACTIVE_DATE
USER1 05/01/15 05/05/15
USER3 05/01/15 05/20/15
USER2 05/07/15 05/20/15
USER1 05/06/15 05/20/15
I can count the number of account_ids where a date is between the active_date and inactive_date for a specific point in time:
Code: Select all
select
count(distinct accountid)
from table
where
'05/05/15'
between ACTIVE_DATE
and INACTIVE_DATE
If I continue to change the date on 5/6/15 I would get 2 users and on 5/7 I would get 3 users.
My question is, how do write this looping test in SQL so I can move down a series of dates and count the number of users that are between the active and inactive dates?
Re: Counting Based on Date between Two Dates
Posted: Sat Jul 25, 2015 3:18 pm
by JimKnicely
Hi,
Vertica has a lot of nice built in analytic functions! I think the TIMESERIES clause, which provides gap-filling and interpolation (GFI) computation, will help you out.
Please take a look at the documentation here:
http://my.vertica.com/docs/7.1.x/HTML/i ... Clause.htm
Example:
Code: Select all
dbadmin=> SELECT * FROM test;
account_id | active_date | inactive_date
------------+-------------+---------------
USER1 | 2015-05-01 | 2015-05-05
USER1 | 2015-05-06 | 2015-05-20
USER2 | 2015-05-07 | 2015-05-20
USER3 | 2015-05-01 | 2015-05-20
(4 rows)
Code: Select all
dbadmin=> SELECT foo2.ts, COUNT(DISTINCT account_id)
dbadmin-> FROM test
dbadmin-> CROSS JOIN (SELECT ts::DATE
dbadmin(> FROM (SELECT MIN(active_date)::TIMESTAMP AS tm FROM test
dbadmin(> UNION
dbadmin(> SELECT MAX(inactive_date)::TIMESTAMP AS tm FROM test) AS foo
dbadmin(> TIMESERIES ts AS '1 DAY' OVER (ORDER BY tm)) foo2
dbadmin-> WHERE foo2.ts BETWEEN active_date AND inactive_date
dbadmin-> GROUP BY foo2.ts
dbadmin-> ORDER BY foo2.ts;
ts | COUNT
------------+-------
2015-05-01 | 2
2015-05-02 | 2
2015-05-03 | 2
2015-05-04 | 2
2015-05-05 | 2
2015-05-06 | 2
2015-05-07 | 3
2015-05-08 | 3
2015-05-09 | 3
2015-05-10 | 3
2015-05-11 | 3
2015-05-12 | 3
2015-05-13 | 3
2015-05-14 | 3
2015-05-15 | 3
2015-05-16 | 3
2015-05-17 | 3
2015-05-18 | 3
2015-05-19 | 3
2015-05-20 | 3
(20 rows)
Re: Counting Based on Date between Two Dates
Posted: Sat Jul 25, 2015 5:07 pm
by id10t
Hi!
knicely87 wrote: I think the TIMESERIES clause, which provides gap-filling and interpolation (GFI) computation, will help you out.
And use it on your own risk, because this functionality is buggy.
https://community.dev.hp.com/t5/Vertica ... true#M2579
Example:
(why it starts from timestamp "2013-02-16 18:30:00"?)
daniel=> \set min '''2013-02-26 00:00:00'''
daniel=> \set max '''2013-09-11 00:00:00'''
daniel=> \set interval '''14 DAYS 12 HOURS 47 MINUTES'''
daniel=> SELECT ts
daniel-> FROM (SELECT :min::TIMESTAMP as tm
daniel(> UNION
daniel(> SELECT :max::TIMESTAMP as tm) as t
daniel-> TIMESERIES ts as :interval OVER (ORDER BY tm);
ts
---------------------
2013-02-16 18:30:00
2013-03-03 07:17:00
2013-03-17 20:04:00
2013-04-01 08:51:00
2013-04-15 21:38:00
2013-04-30 10:25:00
2013-05-14 23:12:00
2013-05-29 11:59:00
2013-06-13 00:46:00
2013-06-27 13:33:00
2013-07-12 02:20:00
2013-07-26 15:07:00
2013-08-10 03:54:00
2013-08-24 16:41:00
2013-09-08 05:28:00
(15 rows)
Re: Counting Based on Date between Two Dates
Posted: Sun Jul 26, 2015 3:22 am
by JimKnicely
Hi,
Maybe a better solution would be to use a date dimension table?
Code: Select all
dbadmin=> CREATE TABLE date_dim (the_date date);
CREATE TABLE
Code: Select all
dbadmin=> INSERT INTO date_dim
dbadmin-> SELECT the_date
dbadmin-> FROM (SELECT ('05/01/15'::DATE + (row_number() over () - 1)) the_date
dbadmin(> FROM tables CROSS JOIN columns) foo
dbadmin-> WHERE the_date <= '05/20/15'::DATE;
OUTPUT
--------
20
(1 row)
Code: Select all
dbadmin=> SELECT * FROM date_dim;
the_date
------------
2015-05-01
2015-05-02
2015-05-03
2015-05-04
2015-05-05
2015-05-06
2015-05-07
2015-05-08
2015-05-09
2015-05-10
2015-05-11
2015-05-12
2015-05-13
2015-05-14
2015-05-15
2015-05-16
2015-05-17
2015-05-18
2015-05-19
2015-05-20
(20 rows)
Code: Select all
dbadmin=> SELECT dd.the_date, COUNT(DISTINCT account_id)
dbadmin-> FROM test
dbadmin-> CROSS JOIN date_dim dd
dbadmin-> WHERE dd.the_date BETWEEN active_date AND inactive_date
dbadmin-> GROUP BY dd.the_date
dbadmin-> ORDER BY dd.the_date;
the_date | COUNT
------------+-------
2015-05-01 | 2
2015-05-02 | 2
2015-05-03 | 2
2015-05-04 | 2
2015-05-05 | 2
2015-05-06 | 2
2015-05-07 | 3
2015-05-08 | 3
2015-05-09 | 3
2015-05-10 | 3
2015-05-11 | 3
2015-05-12 | 3
2015-05-13 | 3
2015-05-14 | 3
2015-05-15 | 3
2015-05-16 | 3
2015-05-17 | 3
2015-05-18 | 3
2015-05-19 | 3
2015-05-20 | 3
(20 rows)
Re: Counting Based on Date between Two Dates
Posted: Sun Jul 26, 2015 8:20 am
by id10t
Hi!
knicely87 wrote:Maybe a better solution would be to use a date dimension table?
A better solution is - to push on Vertica developers to fix a bug, otherwise it's not a solution, it's a workaround.
Re: Counting Based on Date between Two Dates
Posted: Thu Sep 03, 2015 6:41 pm
by Razgriz
Thanks everyone for their help!