如何解决不同ID的30天滚动计数
因此,在查看了似乎是一个常见问题并且无法为我工作时找到任何解决方案之后,我决定我应该自找机会。
我有一个包含两列的数据集:session_start_time,uid
我正尝试生成30天的唯一会话滚动统计
查询每天唯一的uid数量非常简单:
SELECT
COUNT(DISTINCT(uid))
FROM segment_clean.users_sessions
WHERE session_start_time >= CURRENT_DATE - interval '30 days'
计算日期范围内的每日唯一uid也相对简单。
SELECT
DATE_TRUNC('day',session_start_time) AS "date",COUNT(DISTINCT uid) AS "count"
FROM segment_clean.users_sessions
WHERE session_start_time >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY date(session_start_time)
然后我尝试了几种方法来在一个时间间隔内进行30天的滚动唯一计数
SELECT
DATE(session_start_time) AS "running30day",COUNT(distinct(
case when date(session_start_time) >= running30day - interval '30 days'
AND date(session_start_time) <= running30day
then uid
end)
) AS "unique_30day"
FROM segment_clean.users_sessions
WHERE session_start_time >= CURRENT_DATE - interval '3 months'
GROUP BY date(session_start_time)
Order BY running30day desc
我真的以为这会行得通,但是当调查结果时,看来我得到的结果与执行每日唯一性而不是30天唯一性时的结果相同。
我正在使用SQL查询编辑器从Metabase编写此查询。基础表处于redshift。
如果您到目前为止已经读到了,谢谢您,您的时间很有价值,我感谢您花了一些时间阅读我的问题。
编辑: 按照正确的要求,我添加了一个正在使用的数据集和所需结果的示例。
+-----+-------------------------------+
| UID | SESSION_START_TIME |
+-----+-------------------------------+
| | |
| 10 | 2020-01-13T01:46:07.000-05:00 |
| | |
| 5 | 2020-01-13T01:46:07.000-05:00 |
| | |
| 3 | 2020-01-18T02:49:23.000-05:00 |
| | |
| 9 | 2020-03-06T18:18:28.000-05:00 |
| | |
| 2 | 2020-03-06T18:18:28.000-05:00 |
| | |
| 8 | 2020-03-31T23:13:33.000-04:00 |
| | |
| 3 | 2020-08-28T18:23:15.000-04:00 |
| | |
| 2 | 2020-08-28T18:23:15.000-04:00 |
| | |
| 9 | 2020-08-28T18:23:15.000-04:00 |
| | |
| 3 | 2020-08-28T18:23:15.000-04:00 |
| | |
| 8 | 2020-09-15T16:40:29.000-04:00 |
| | |
| 3 | 2020-09-21T20:49:09.000-04:00 |
| | |
| 1 | 2020-11-05T21:31:48.000-05:00 |
| | |
| 6 | 2020-11-05T21:31:48.000-05:00 |
| | |
| 8 | 2020-12-12T04:42:00.000-05:00 |
| | |
| 8 | 2020-12-12T04:42:00.000-05:00 |
| | |
| 5 | 2020-12-12T04:42:00.000-05:00 |
+-----+-------------------------------+
bellow是我想要的结果:
+------------+---------------------+
| DATE | UNIQUE 30 DAY COUNT |
+------------+---------------------+
| | |
| 2020-01-13 | 3 |
| | |
| 2020-01-18 | 1 |
| | |
| 2020-03-06 | 3 |
| | |
| 2020-03-31 | 1 |
| | |
| 2020-08-28 | 4 |
| | |
| 2020-09-15 | 2 |
| | |
| 2020-09-21 | 1 |
| | |
| 2020-11-05 | 2 |
| | |
| 2020-12-12 | 2 |
+------------+---------------------+
谢谢
解决方法
您可以通过在30天(或31天)后保持对用户何时被计数以及不被计数的计数器进行计数。然后,确定要计数的“岛屿”并进行汇总。这涉及:
- 取消透视数据,使每个会话具有“输入计数”和“离开”计数。
- 累积计数,以便每天为每个知道是否计数的用户提供计数。
- 这定义了计数的“孤岛”。确定这些岛屿的起点和终点,摆脱它们之间的所有碎屑。
- 现在,您只需在每个日期进行累计即可确定30天的课程。
在SQL中,这看起来像:
with t as (
select uid,date_trunc('day',session_start_time) as s_day,1 as inc
from users_sessions
union all
select uid,session_start_time) + interval '31 day' as s_day,-1
from users_sessions
),tt as ( -- increment the ins and outs to determine whether a uid is in or out on a given day
select uid,s_day,sum(inc) as day_inc,sum(sum(inc)) over (partition by uid order by s_day rows between unbounded preceding and current row) as running_inc
from t
group by uid,s_day
),ttt as ( -- find the beginning and end of the islands
select tt.uid,tt.s_day,(case when running_inc > 0 then 1 else -1 end) as in_island
from (select tt.*,lag(running_inc) over (partition by uid order by s_day) as prev_running_inc,lead(running_inc) over (partition by uid order by s_day) as next_running_inc
from tt
) tt
where running_inc > 0 and (prev_running_inc = 0 or prev_running_inc is null) or
running_inc = 0 and (next_running_inc > 0 or next_running_inc is null)
)
select s_day,sum(sum(in_island)) over (order by s_day rows between unbounded preceding and current row) as active_30
from ttt
group by s_day;
Here是db 小提琴。
,我很确定更简单的方法是使用连接。这将创建每天进行会话的所有不同用户的列表以及数据中所有不同日期的列表。然后将用户列表一对多连接到日期列表并计算不同的用户,这里的关键是扩展连接条件,通过不等式系统将日期范围匹配到单个日期。
with users as
(select
distinct uid,session_start_time) AS dt
from <table>
where session_start_time >= '2021-05-01'),dates as
(select
distinct date_trunc('day',session_start_time) AS dt
from <table>
where session_start_time >= '2021-05-01')
select
count(distinct uid),dates.dt
from users
join
dates
on users.dt >= dates.dt - 29
and users.dt <= dates.dt
group by dates.dt
order by dt desc
;
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。