获取过去 7 天内的唯一用户数
getting a unique count of users in the last 7 days
我有一个数据集,我想在其中找到过去 7 天(即过去 7 天)活跃的人。例如,
date<- c('2009-01-03', '2009-01-03', '2009-01-03', '2009-01-04', '2009-01-05', '2009-02-01')
person<- c('Abe', 'John', 'Abe', 'Kate', 'Jessica', 'Anu')
df<- data.frame(date, person)
我想创建一个名为 last_seven_days_active 的列,该列采用过去 7 天内所有活跃用户的唯一计数。
last_seven_days_active
0
0
0
2
3
0
我试过了。有什么建议么?
library(zoo)
df$last_seven_days_active <- rollsumr(df$person, k = 8, fill = NA)
一个base
解决方案:
df$date <- as.Date(as.character(df$date))
df$last_seven_days_active <- with(df, sapply(date, function(x) length(unique(person[date >= x - 7 & date < x]))))
输出:
date person last_seven_days_active
1 2009-01-03 Abe 0
2 2009-01-03 John 0
3 2009-01-03 Abe 0
4 2009-01-04 Kate 2
5 2009-01-05 Jessica 3
6 2009-02-01 Anu 0
带有 between
和 map
的选项
library(dplyr)
library(purrr)
df %>%
mutate(last_seven_days_active = map_dbl(as.Date(date),
~ n_distinct(person[between(date, .x - 7, .x) & date != .x] )))
# date person last_seven_days_active
#1 2009-01-03 Abe 0
#2 2009-01-03 John 0
#3 2009-01-03 Abe 0
#4 2009-01-04 Kate 2
#5 2009-01-05 Jessica 3
#6 2009-02-01 Anu 0
使用data.table
的选项:
library(data.table)
setDT(df)[, date := as.IDate(date, format="%Y-%m-%d")]
df[, days7ago := date - 7L]
df[, last_seven_days_active :=
df[df, on=.(date>=days7ago, date<date), by=.EACHI,
length(unique(person[!is.na(person)]))]$V1
]
输出:
date person days7ago last_seven_days_active
1: 2009-01-03 Abe 2008-12-27 0
2: 2009-01-03 John 2008-12-27 0
3: 2009-01-03 Abe 2008-12-27 0
4: 2009-01-04 Kate 2008-12-28 2
5: 2009-01-05 Jessica 2008-12-29 3
6: 2009-02-01 Anu 2009-01-25 0
我有一个数据集,我想在其中找到过去 7 天(即过去 7 天)活跃的人。例如,
date<- c('2009-01-03', '2009-01-03', '2009-01-03', '2009-01-04', '2009-01-05', '2009-02-01')
person<- c('Abe', 'John', 'Abe', 'Kate', 'Jessica', 'Anu')
df<- data.frame(date, person)
我想创建一个名为 last_seven_days_active 的列,该列采用过去 7 天内所有活跃用户的唯一计数。
last_seven_days_active
0
0
0
2
3
0
我试过了。有什么建议么?
library(zoo)
df$last_seven_days_active <- rollsumr(df$person, k = 8, fill = NA)
一个base
解决方案:
df$date <- as.Date(as.character(df$date))
df$last_seven_days_active <- with(df, sapply(date, function(x) length(unique(person[date >= x - 7 & date < x]))))
输出:
date person last_seven_days_active
1 2009-01-03 Abe 0
2 2009-01-03 John 0
3 2009-01-03 Abe 0
4 2009-01-04 Kate 2
5 2009-01-05 Jessica 3
6 2009-02-01 Anu 0
带有 between
和 map
library(dplyr)
library(purrr)
df %>%
mutate(last_seven_days_active = map_dbl(as.Date(date),
~ n_distinct(person[between(date, .x - 7, .x) & date != .x] )))
# date person last_seven_days_active
#1 2009-01-03 Abe 0
#2 2009-01-03 John 0
#3 2009-01-03 Abe 0
#4 2009-01-04 Kate 2
#5 2009-01-05 Jessica 3
#6 2009-02-01 Anu 0
使用data.table
的选项:
library(data.table)
setDT(df)[, date := as.IDate(date, format="%Y-%m-%d")]
df[, days7ago := date - 7L]
df[, last_seven_days_active :=
df[df, on=.(date>=days7ago, date<date), by=.EACHI,
length(unique(person[!is.na(person)]))]$V1
]
输出:
date person days7ago last_seven_days_active
1: 2009-01-03 Abe 2008-12-27 0
2: 2009-01-03 John 2008-12-27 0
3: 2009-01-03 Abe 2008-12-27 0
4: 2009-01-04 Kate 2008-12-28 2
5: 2009-01-05 Jessica 2008-12-29 3
6: 2009-02-01 Anu 2009-01-25 0