dplyr:为什么不使用 summarise() 总结案例?
dplyr: Why are cases not summarized using summarise()?
我有
> head(df,7)
date pos cons_week
1 2020-03-30 313 169
2 2020-03-31 255 169
3 2020-04-01 282 169
4 2020-04-02 382 169
5 2020-04-03 473 169
6 2020-04-04 312 169
7 2020-04-05 158 169
pos
表示每天的 COVID 阳性病例数。 cons_week
是自锁定以来的连续周数。因此,对于每个 cons_week
,我有 7 个条目 pos
。我想总结一下,所以我每周有 pos
的总数。
我尝试了不同的版本,比如
df %>% group_by(cons_week) %>%
summarise(n = n())
或
df %>% group_by(cons_week, pos) %>%
summarise(n = sum())
预期输出
cons_week n
169 2175
170 1651
171 1179
数据
df <- structure(list(date = structure(c(18351, 18352, 18353, 18354,
18355, 18356, 18357, 18358, 18359, 18360, 18361, 18362, 18363,
18364, 18365, 18366, 18367, 18368, 18369, 18370, 18371), class = "Date"),
pos = c("313", "255", "282", "382", "473", "312", "158",
"424", "347", "301", "140", "142", "140", "157", "156", "258",
"199", "178", "168", "106", "114"), cons_week = c(169, 169,
169, 169, 169, 169, 169, 170, 170, 170, 170, 170, 170, 170,
171, 171, 171, 171, 171, 171, 171)), row.names = c(NA, 21L
), class = "data.frame")
因为 pos
在您的 df
中是 character
。您需要先将其转换为 numeric
。例如:
library(dplyr)
df %>%
mutate(pos = as.numeric(pos)) %>%
group_by(cons_week) %>%
summarise(n = sum(pos))
或:
df %>%
group_by(cons_week) %>%
summarise(n = sum(as.numeric(pos)))
输出:
cons_week n
<dbl> <dbl>
1 169 2175
2 170 1651
3 171 1179
使用:
df %>% group_by(cons_week) %>%
summarise(n = sum(as.numeric(pos)))
或之前:
df$pos <- as.numeric(df$pos)
df %>% group_by(cons_week) %>%
summarise(n = sum(pos))
问题是 pos
是字符类型 (class
),而不是 numeric
。
我有
> head(df,7)
date pos cons_week
1 2020-03-30 313 169
2 2020-03-31 255 169
3 2020-04-01 282 169
4 2020-04-02 382 169
5 2020-04-03 473 169
6 2020-04-04 312 169
7 2020-04-05 158 169
pos
表示每天的 COVID 阳性病例数。 cons_week
是自锁定以来的连续周数。因此,对于每个 cons_week
,我有 7 个条目 pos
。我想总结一下,所以我每周有 pos
的总数。
我尝试了不同的版本,比如
df %>% group_by(cons_week) %>%
summarise(n = n())
或
df %>% group_by(cons_week, pos) %>%
summarise(n = sum())
预期输出
cons_week n
169 2175
170 1651
171 1179
数据
df <- structure(list(date = structure(c(18351, 18352, 18353, 18354,
18355, 18356, 18357, 18358, 18359, 18360, 18361, 18362, 18363,
18364, 18365, 18366, 18367, 18368, 18369, 18370, 18371), class = "Date"),
pos = c("313", "255", "282", "382", "473", "312", "158",
"424", "347", "301", "140", "142", "140", "157", "156", "258",
"199", "178", "168", "106", "114"), cons_week = c(169, 169,
169, 169, 169, 169, 169, 170, 170, 170, 170, 170, 170, 170,
171, 171, 171, 171, 171, 171, 171)), row.names = c(NA, 21L
), class = "data.frame")
因为 pos
在您的 df
中是 character
。您需要先将其转换为 numeric
。例如:
library(dplyr)
df %>%
mutate(pos = as.numeric(pos)) %>%
group_by(cons_week) %>%
summarise(n = sum(pos))
或:
df %>%
group_by(cons_week) %>%
summarise(n = sum(as.numeric(pos)))
输出:
cons_week n
<dbl> <dbl>
1 169 2175
2 170 1651
3 171 1179
使用:
df %>% group_by(cons_week) %>%
summarise(n = sum(as.numeric(pos)))
或之前:
df$pos <- as.numeric(df$pos)
df %>% group_by(cons_week) %>%
summarise(n = sum(pos))
问题是 pos
是字符类型 (class
),而不是 numeric
。