每天按组计算事件,包括 R 中的 0
Count events by group per day including 0 in R
我想要每天的事件计数列,包括没有事件的日期。这是我的数据示例,虽然我的真实数据集有超过 100 ID
的
dt <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), date = c("1/01/2000", "2/01/2000", "2/01/2000",
"5/01/2000", "5/01/2000", "5/01/2000", "6/01/2000", "2/01/2000", "3/01/2000",
"3/01/2000", "4/01/2000", "4/01/2000", "4/01/2000", "4/01/2000",
"5/01/2000", "9/01/2000")), .Names = c("id", "date"),
row.names = c(NA, -16L), class = "data.frame")
我想要的是:
date count 1 count 2
1/01/2000 0 0
2/01/2000 2 1
3/01/2000 0 2
4/01/2000 0 4
5/01/2000 3 1
6/01/2000 1 0
7/01/2000 0 0
8/01/2000 0 0
9/01/2000 0 1
我的真实数据是从 1/01/2000
到 31/12/2000
的日期。我希望所有 ID
都有所有这些日期,即使在某些日子里事件为零。
我们可以使用 complete
,然后用 pivot_wider
重塑为 'wide'。 OP 将 'date' 格式的示例数据显示为 month/day/year
。如果原始数据的格式为 year-month-day
则将 mdy(date)
更改为 ymd(date)
library(lubridate)
library(tidyr)
library(dplyr)
library(stringr)
dt %>%
mutate(date = mdy(date), count = 1) %>%
group_by(id = str_c('count', id)) %>%
complete(date = seq(min(.$date, na.rm = TRUE),
max(.$date, na.rm = TRUE), by = 'month'),
fill = list(count = 0)) %>%
ungroup %>%
pivot_wider(names_from = id, values_from =count,
values_fn = sum, values_fill = 0)
-输出
# A tibble: 9 × 3
date count1 count2
<date> <dbl> <dbl>
1 2000-01-01 1 0
2 2000-02-01 2 1
3 2000-03-01 0 2
4 2000-04-01 0 4
5 2000-05-01 3 1
6 2000-06-01 1 0
7 2000-07-01 0 0
8 2000-08-01 0 0
9 2000-09-01 0 1
这是一种使用 data.table
的方法
library(data.table)
setDT(dt)[,`:=`(date=as.Date(date, "%Y-%m-%d"),id=paste0("count",id))]
dcast(
dt[SJ(date=seq(min(date), max(date),1)), on=.(date)],
date~id,fun.aggregate = length,
)[,`NA`:=NULL]
输出:
date count1 count2
1: 2020-01-01 1 0
2: 2020-01-02 2 1
3: 2020-01-03 0 2
4: 2020-01-04 0 4
5: 2020-01-05 3 1
6: 2020-01-06 1 0
7: 2020-01-07 0 0
8: 2020-01-08 0 0
9: 2020-01-09 0 1
如果您知道自己的日期,正如您在 post 中指出的那样,您可以直接使用这些日期:
library(data.table)
setDT(dt)[,`:=`(date=as.Date(date, "%Y-%m-%d"), id=paste0("count",id))]
result = dcast(
dt[SJ(date=seq(as.Date("2020-01-01"), as.Date("2020-12-31"),1)), on=.(date)],
date~id,fun.aggregate = length,
)[,`NA`:=NULL]
输出:
date count1 count2
1: 2020-01-01 1 0
2: 2020-01-02 2 1
3: 2020-01-03 0 2
4: 2020-01-04 0 4
5: 2020-01-05 3 1
---
362: 2020-12-27 0 0
363: 2020-12-28 0 0
364: 2020-12-29 0 0
365: 2020-12-30 0 0
366: 2020-12-31 0 0
输入:
dt = structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L), date = c("2020-01-01", "2020-01-02", "2020-01-02",
"2020-01-05", "2020-01-05", "2020-01-05", "2020-01-06", "2020-01-02", "2020-01-03",
"2020-01-03", "2020-01-04", "2020-01-04", "2020-01-04", "2020-01-04", "2020-01-05",
"2020-01-09")), row.names = c(NA, -16L), class = "data.frame")
这是一个基本的 R 选项,使用 table
+ seq
+ factor
with(
transform(
dt,
date = as.Date(date, "%d/%m/%Y")
),
table(
factor(date,
levels = as.character(seq(min(date), max(date), 1))
),
id
)
)
这给出了
id
1 2
2000-01-01 1 0
2000-01-02 2 1
2000-01-03 0 2
2000-01-04 0 4
2000-01-05 3 1
2000-01-06 1 0
2000-01-07 0 0
2000-01-08 0 0
2000-01-09 0 1
或者,如果我们想要 data.frame 输出
,我们可以进一步使用 reshape
+ as.data.frame
reshape(
as.data.frame(
with(
transform(
dt,
date = as.Date(date, "%d/%m/%Y")
),
table(
factor(date,
levels = as.character(seq(min(date), max(date), 1))
),
id
)
)
),
idvar = "Var1",
timevar = "id",
direction = "wide"
)
这给出了
Var1 Freq.1 Freq.2
1 2000-01-01 1 0
2 2000-01-02 2 1
3 2000-01-03 0 2
4 2000-01-04 0 4
5 2000-01-05 3 1
6 2000-01-06 1 0
7 2000-01-07 0 0
8 2000-01-08 0 0
9 2000-01-09 0 1
我想要每天的事件计数列,包括没有事件的日期。这是我的数据示例,虽然我的真实数据集有超过 100 ID
的
dt <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), date = c("1/01/2000", "2/01/2000", "2/01/2000",
"5/01/2000", "5/01/2000", "5/01/2000", "6/01/2000", "2/01/2000", "3/01/2000",
"3/01/2000", "4/01/2000", "4/01/2000", "4/01/2000", "4/01/2000",
"5/01/2000", "9/01/2000")), .Names = c("id", "date"),
row.names = c(NA, -16L), class = "data.frame")
我想要的是:
date count 1 count 2
1/01/2000 0 0
2/01/2000 2 1
3/01/2000 0 2
4/01/2000 0 4
5/01/2000 3 1
6/01/2000 1 0
7/01/2000 0 0
8/01/2000 0 0
9/01/2000 0 1
我的真实数据是从 1/01/2000
到 31/12/2000
的日期。我希望所有 ID
都有所有这些日期,即使在某些日子里事件为零。
我们可以使用 complete
,然后用 pivot_wider
重塑为 'wide'。 OP 将 'date' 格式的示例数据显示为 month/day/year
。如果原始数据的格式为 year-month-day
则将 mdy(date)
更改为 ymd(date)
library(lubridate)
library(tidyr)
library(dplyr)
library(stringr)
dt %>%
mutate(date = mdy(date), count = 1) %>%
group_by(id = str_c('count', id)) %>%
complete(date = seq(min(.$date, na.rm = TRUE),
max(.$date, na.rm = TRUE), by = 'month'),
fill = list(count = 0)) %>%
ungroup %>%
pivot_wider(names_from = id, values_from =count,
values_fn = sum, values_fill = 0)
-输出
# A tibble: 9 × 3
date count1 count2
<date> <dbl> <dbl>
1 2000-01-01 1 0
2 2000-02-01 2 1
3 2000-03-01 0 2
4 2000-04-01 0 4
5 2000-05-01 3 1
6 2000-06-01 1 0
7 2000-07-01 0 0
8 2000-08-01 0 0
9 2000-09-01 0 1
这是一种使用 data.table
library(data.table)
setDT(dt)[,`:=`(date=as.Date(date, "%Y-%m-%d"),id=paste0("count",id))]
dcast(
dt[SJ(date=seq(min(date), max(date),1)), on=.(date)],
date~id,fun.aggregate = length,
)[,`NA`:=NULL]
输出:
date count1 count2
1: 2020-01-01 1 0
2: 2020-01-02 2 1
3: 2020-01-03 0 2
4: 2020-01-04 0 4
5: 2020-01-05 3 1
6: 2020-01-06 1 0
7: 2020-01-07 0 0
8: 2020-01-08 0 0
9: 2020-01-09 0 1
如果您知道自己的日期,正如您在 post 中指出的那样,您可以直接使用这些日期:
library(data.table)
setDT(dt)[,`:=`(date=as.Date(date, "%Y-%m-%d"), id=paste0("count",id))]
result = dcast(
dt[SJ(date=seq(as.Date("2020-01-01"), as.Date("2020-12-31"),1)), on=.(date)],
date~id,fun.aggregate = length,
)[,`NA`:=NULL]
输出:
date count1 count2
1: 2020-01-01 1 0
2: 2020-01-02 2 1
3: 2020-01-03 0 2
4: 2020-01-04 0 4
5: 2020-01-05 3 1
---
362: 2020-12-27 0 0
363: 2020-12-28 0 0
364: 2020-12-29 0 0
365: 2020-12-30 0 0
366: 2020-12-31 0 0
输入:
dt = structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L), date = c("2020-01-01", "2020-01-02", "2020-01-02",
"2020-01-05", "2020-01-05", "2020-01-05", "2020-01-06", "2020-01-02", "2020-01-03",
"2020-01-03", "2020-01-04", "2020-01-04", "2020-01-04", "2020-01-04", "2020-01-05",
"2020-01-09")), row.names = c(NA, -16L), class = "data.frame")
这是一个基本的 R 选项,使用 table
+ seq
+ factor
with(
transform(
dt,
date = as.Date(date, "%d/%m/%Y")
),
table(
factor(date,
levels = as.character(seq(min(date), max(date), 1))
),
id
)
)
这给出了
id
1 2
2000-01-01 1 0
2000-01-02 2 1
2000-01-03 0 2
2000-01-04 0 4
2000-01-05 3 1
2000-01-06 1 0
2000-01-07 0 0
2000-01-08 0 0
2000-01-09 0 1
或者,如果我们想要 data.frame 输出
,我们可以进一步使用reshape
+ as.data.frame
reshape(
as.data.frame(
with(
transform(
dt,
date = as.Date(date, "%d/%m/%Y")
),
table(
factor(date,
levels = as.character(seq(min(date), max(date), 1))
),
id
)
)
),
idvar = "Var1",
timevar = "id",
direction = "wide"
)
这给出了
Var1 Freq.1 Freq.2
1 2000-01-01 1 0
2 2000-01-02 2 1
3 2000-01-03 0 2
4 2000-01-04 0 4
5 2000-01-05 3 1
6 2000-01-06 1 0
7 2000-01-07 0 0
8 2000-01-08 0 0
9 2000-01-09 0 1