How many customers where active last 365 days?
We use this as a "churn" KPI.
Say 3,456,466 customers made at least one purchase between yesterday - 365 and yesterday.
Now we run the following query every day just to see for one day at a time or populate a spreadsheet:
SELECT
COUNT(DISTINCT CUSTOMER_KEY) AS "Customers"
FROM F_DAY_SALES
WHERE DATE_KEY BETWEEN CURRENT_DATE - 366 AND CURRENT_DATE - 1
COUNT DISTINCT don't take the UNBOUNDED PRECEDING | FOLLOWING (?)
Is there a way to get something like
Date, Customers
2016-01-01 3,456,466
2016-01-02 3,456,578
2016-01-03 3,457,897
.
.
.
Where the numbers are COUNT(DISTINCT CUSTOMER_KEY) OVER (PARTITION BY DATE_KEY ORDER BY DATE_KEY ROWS BETWEEN 365 PRECEDING AND 0 FOLLOWING)?
Best regards!
P
Running Count Distinct
Moderator: NorbertKrupa
Re: Running Count Distinct
I just submitted this exact request a few weeks ago through Product Management. The more customers who request it, the more likely it is to be implemented. Open a ticket asking for it.
—Sharon
—Sharon
Sharon Cutter
Vertica Consultant, Zazz Technologies LLC
Vertica Consultant, Zazz Technologies LLC
- JimKnicely
- Site Admin
- Posts: 1825
- Joined: Sat Jan 21, 2012 4:58 am
- Contact:
Re: Running Count Distinct
Hi,
Are you sure you want to use an analytic function? Looking at your expected output, it seems you just want a distinct customer count for every date. Maybe just using a simple GROUP BY is sufficient?
Example:
Note that I used the ADD_MONTHS function to look back one year (instead of subtracting 365) to account for leap years. You may not want this, but just a thought.
Are you sure you want to use an analytic function? Looking at your expected output, it seems you just want a distinct customer count for every date. Maybe just using a simple GROUP BY is sufficient?
Example:
Code: Select all
dbadmin=> select * from F_DAY_SALES order by 1;
date_key | customer_key
----------+--------------
20160101 | 1
20160101 | 1
20160101 | 1
20160101 | 2
20160101 | 3
20160101 | 4
20160102 | 1
20160102 | 5
20160103 | 1
20160103 | 2
20160103 | 2
20160103 | 3
20160103 | 3
(13 rows)
dbadmin=> select to_date(date_key::varchar, 'YYYYMMDD') date, count(distinct customer_key) customers from F_DAY_SALES where to_date(date_key::varchar, 'YYYYMMDD') between add_months((now()-1)::date, -12) and (now()-1)::date group by date_key order by 1;
date | customers
------------+-----------
2016-01-01 | 4
2016-01-02 | 2
2016-01-03 | 3
(3 rows)
Jim Knicely
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
- JimKnicely
- Site Admin
- Posts: 1825
- Joined: Sat Jan 21, 2012 4:58 am
- Contact:
Re: Running Count Distinct
Unless you are looking to get a "running" distinct count of customer keys. Maybe you can try a query in the example below. It is using a 10 day window to keep the result set small.
Example:
I think this works and is what you are looking for? You can modify the the predicate "where foo.the_date between sysdate::date-10 and sysdate::date" to be "where foo.the_date between sysdate::date-365 and sysdate::date" or to what ever you want.
This query might not perform all that well on large data sets. I'm with Sharon in that it will be nice when the analytic functions support the distinct key word!
Example:
Code: Select all
dbadmin=> select * from F_DAY_SALES order by 1;
date_key | customer_key | date_key_date
----------+--------------+---------------
20161004 | 1 | 2016-10-04
20161004 | 1 | 2016-10-04
20161004 | 2 | 2016-10-04
20161004 | 3 | 2016-10-04
20161005 | 3 | 2016-10-05
20161006 | 3 | 2016-10-06
20161006 | 4 | 2016-10-06
20161008 | 1 | 2016-10-08
20161008 | 10 | 2016-10-08
20161009 | 10 | 2016-10-09
20161011 | 1 | 2016-10-11
20161011 | 2 | 2016-10-11
20161011 | 15 | 2016-10-11
20161012 | 100 | 2016-10-12
(14 rows)
dbadmin=> select the_date date,
dbadmin-> customer_key
dbadmin-> from (select b.ts the_date,
dbadmin(> count(distinct a.customer_key) customer_key
dbadmin(> from (SELECT ts
dbadmin(> FROM (SELECT '1900-01-01'::timestamp AS tm
dbadmin(> UNION ALL
dbadmin(> SELECT '2100-01-01'::timestamp AS tm) AS t
dbadmin(> TIMESERIES ts AS '1 day' OVER (ORDER BY tm)) b
dbadmin(> left join F_DAY_SALES a
dbadmin(> on a.date_key_date <= b.ts
dbadmin(> group by b.ts) foo
dbadmin-> where foo.the_date between sysdate::date-10 and sysdate::date
dbadmin-> order by foo.the_date;
date | customer_key
---------------------+--------------
2016-10-03 00:00:00 | 0
2016-10-04 00:00:00 | 3
2016-10-05 00:00:00 | 3
2016-10-06 00:00:00 | 4
2016-10-07 00:00:00 | 4
2016-10-08 00:00:00 | 5
2016-10-09 00:00:00 | 5
2016-10-10 00:00:00 | 5
2016-10-11 00:00:00 | 6
2016-10-12 00:00:00 | 7
(10 rows)
This query might not perform all that well on large data sets. I'm with Sharon in that it will be nice when the analytic functions support the distinct key word!
Jim Knicely
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.