比如下表,记录了日期和对应用户:
datetime,username
2023-01-01,UserA
2023-01-01,UserB
2023-01-02,UserA
2023-01-03,UserC
...
需求是按天统计出该天起前七天的周活,当天没有对应用户的话补 0 ,如:
2023-01-08(2023-01-01~2023-01-07), 7
2023-01-09(2023-01-02~2023-01-08), 10
....
1
pota 2023-03-02 15:15:55 +08:00
这种我感觉还是将每天的统计数据存起来直接拿好点
|
2
oRbceGJ 2023-03-02 15:31:16 +08:00
用偏移类开窗函数?
|
3
Ericcccccccc 2023-03-02 15:34:17 +08:00
没必要强行用纯 sql 做这个事情吧.
|
4
aw2350 2023-03-02 15:35:36 +08:00
parttition by ,按照日期汇总
|
5
yuhu96 2023-03-02 15:41:12 +08:00
能不能按照当前[date ,date - 7 ]作为一个分组字段来进行聚合啊。窗口函数感觉不适用在这里。
|
6
byty18768 2023-03-02 15:42:45 +08:00
搞一张存储自然日的日期表,excel 生成很快的
` datetime,start_date,end_date 2023-01-08,2023-01-01,2023-01-07 2023-01-09,2023-01-02,2023-01-08 2023-01-10,2023-01-03,2023-01-09 ` 笛卡尔积 ` select a. datetime ,count(distinct username) from 日期表 a join 数据表 b on b. datetime between a. start_date and a.end_date group by 1 ` |
7
Chad0000 2023-03-02 15:43:23 +08:00
记录日活,主键:
日期_用户 Id 。没活的不记。然后周活就是最近七天的,Group By 用户 Id 即可。 |
8
JKeita 2023-03-02 15:50:02 +08:00
按天取数据存到一份文本文件中,然后直接用 linux 命令进行去重计数。
|
9
killva4624 OP @byty18768 直接用子查询生成这个这个自然日表也可以吧?
|
10
YYYYMMDDHHSS 2023-03-02 17:40:33 +08:00
with data as (
select '2023-01-01' as dt ,'UserA' as user_name union all select '2023-01-01' as dt ,'UserB' as user_name union all select '2023-01-02' as dt ,'UserA' as user_name union all select '2023-01-03' as dt ,'UserC' as user_name union all select '2023-01-04' as dt ,'UserA' as user_name union all select '2023-01-05' as dt ,'UserB' as user_name union all select '2023-01-06' as dt ,'UserA' as user_name union all select '2023-01-07' as dt ,'UserC' as user_name union all select '2023-01-09' as dt ,'UserA' as user_name union all select '2023-01-10' as dt ,'UserB' as user_name union all select '2023-01-11' as dt ,'UserA' as user_name union all select '2023-01-12' as dt ,'UserC' as user_name union all select '2023-01-13' as dt ,'UserA' as user_name union all select '2023-01-14' as dt ,'UserB' as user_name union all select '2023-01-15' as dt ,'UserA' as user_name union all select '2023-01-16' as dt ,'UserC' as user_name union all select '2023-01-17' as dt ,'UserA' as user_name union all select '2023-01-18' as dt ,'UserB' as user_name ) select t.dt, date_add(t.dt, 7) dt_to, count(distinct t.user_name) as cnt from ( select d1.dt, d2.user_name from data d1 join data d2 where 1 = 1 --and d1.dt > d2.dt and datediff(d1.dt, d2.dt) <= 7 ) t group by dt ---- 大数据量不建议这样做 |
12
vitoliu 2023-03-02 18:35:13 +08:00
SQL 挺难想的,我反正比较难硬加出来。通过代码来构建逻辑简单清晰,灵活性更强。
统计某一个时段的周活,可以按天、小时统计,定时任务去跑,还可以无限加条件。比如用户注册时间小于 3 个月,判断新用户留存情况。 |
13
28Sv0ngQfIE7Yloe 2023-03-02 18:41:42 +08:00
这种不都是定时调度 ETL 做成指标放到数仓里吗?
|
14
byty18768 2023-03-06 10:27:40 +08:00
@killva4624 主要是你不是说可能会存在整天都没有用户的嘛,所以考虑导入表来补 0 。如果底层每天都是全的肯定没问题
|