在给定条件下创建总结观察的新变量
Create new variable that summarizes observation given a certain condition
你好,我是 R 的新手,我不明白为什么我的以下方法不起作用。我有这个 df1,看起来像这样:
view duration_hours date
1 a 5 2021-03-29
2 a 7 2021-03-29
3 a 3 2021-03-30
4 b 2 2021-03-30
5 b 5 2021-03-30
6 c 9 2021-03-30
7 c 2 2021-03-31
8 c 3 2021-04-01
我想要一个新的数据框 (df2),它可以对所有视图的持续时间求和并拆分为特定日期的单个视图
date duration duration_sum a b c
1 2021-03-29 12 12 0 0
2 2021-03-30 19 3 7 9
3 2021-03-31 2 0 0 2
4 2021-04-01 3 0 0 3
首先,我仅针对“整体”持续时间尝试了以下方法,按照预期创建了“duration_sum”变量,其中包含每个日期的总持续时间
df2 <- df1 %>%
group_by(date) %>%
summarise(duration_sum = sum(duration_hours, na.rm = TRUE)
然后我尝试通过以下方式扩充代码来添加其他变量
df2<- df1 %>%
group_by(date) %>%
summarise(duration_sum = sum(duration_hours, na.rm = TRUE),
a =sum(duration_hours[view=="a"], na.r = TRUE),
b =sum(duration_hours[view=="b"], na.r = TRUE),
c =sum(duration_hours[view=="c"], na.r = TRUE))
但这并没有使账户产生正确的金额。我做错了什么?
参数是 na.rm
而不是 na.r
。当我们有一个不匹配的参数时,TRUE
被强制为 1(FALSE
为 0 - 因此总数加 1)
例如
sum(c(1, 2), na.r = TRUE)
#[1] 4
sum(c(1, 2), na.rm = TRUE)
#[1] 3
OP 的更正代码为
library(dplyr)
df1 %>%
group_by(date) %>%
summarise(duration_sum = sum(duration_hours, na.rm = TRUE),
a =sum(duration_hours[view=="a"], na.rm = TRUE),
b =sum(duration_hours[view=="b"], na.rm = TRUE),
c =sum(duration_hours[view=="c"], na.rm = TRUE))
# A tibble: 4 x 5
# date duration_sum a b c
#* <chr> <int> <int> <int> <int>
#1 2021-03-29 12 12 0 0
#2 2021-03-30 19 3 7 9
#3 2021-03-31 2 0 0 2
#4 2021-04-01 3 0 0 3
或者另一种选择是pivot_wider
library(tidyr)
pivot_wider(df1, names_from = view, values_from = duration_hours,
values_fn = sum, values_fill = 0)
数据
df1 <- structure(list(view = c("a", "a", "a", "b", "b", "c", "c", "c"
), duration_hours = c(5L, 7L, 3L, 2L, 5L, 9L, 2L, 3L), date = c("2021-03-29",
"2021-03-29", "2021-03-30", "2021-03-30", "2021-03-30", "2021-03-30",
"2021-03-31", "2021-04-01")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8"))
你好,我是 R 的新手,我不明白为什么我的以下方法不起作用。我有这个 df1,看起来像这样:
view duration_hours date
1 a 5 2021-03-29
2 a 7 2021-03-29
3 a 3 2021-03-30
4 b 2 2021-03-30
5 b 5 2021-03-30
6 c 9 2021-03-30
7 c 2 2021-03-31
8 c 3 2021-04-01
我想要一个新的数据框 (df2),它可以对所有视图的持续时间求和并拆分为特定日期的单个视图
date duration duration_sum a b c
1 2021-03-29 12 12 0 0
2 2021-03-30 19 3 7 9
3 2021-03-31 2 0 0 2
4 2021-04-01 3 0 0 3
首先,我仅针对“整体”持续时间尝试了以下方法,按照预期创建了“duration_sum”变量,其中包含每个日期的总持续时间
df2 <- df1 %>%
group_by(date) %>%
summarise(duration_sum = sum(duration_hours, na.rm = TRUE)
然后我尝试通过以下方式扩充代码来添加其他变量
df2<- df1 %>%
group_by(date) %>%
summarise(duration_sum = sum(duration_hours, na.rm = TRUE),
a =sum(duration_hours[view=="a"], na.r = TRUE),
b =sum(duration_hours[view=="b"], na.r = TRUE),
c =sum(duration_hours[view=="c"], na.r = TRUE))
但这并没有使账户产生正确的金额。我做错了什么?
参数是 na.rm
而不是 na.r
。当我们有一个不匹配的参数时,TRUE
被强制为 1(FALSE
为 0 - 因此总数加 1)
例如
sum(c(1, 2), na.r = TRUE)
#[1] 4
sum(c(1, 2), na.rm = TRUE)
#[1] 3
OP 的更正代码为
library(dplyr)
df1 %>%
group_by(date) %>%
summarise(duration_sum = sum(duration_hours, na.rm = TRUE),
a =sum(duration_hours[view=="a"], na.rm = TRUE),
b =sum(duration_hours[view=="b"], na.rm = TRUE),
c =sum(duration_hours[view=="c"], na.rm = TRUE))
# A tibble: 4 x 5
# date duration_sum a b c
#* <chr> <int> <int> <int> <int>
#1 2021-03-29 12 12 0 0
#2 2021-03-30 19 3 7 9
#3 2021-03-31 2 0 0 2
#4 2021-04-01 3 0 0 3
或者另一种选择是pivot_wider
library(tidyr)
pivot_wider(df1, names_from = view, values_from = duration_hours,
values_fn = sum, values_fill = 0)
数据
df1 <- structure(list(view = c("a", "a", "a", "b", "b", "c", "c", "c"
), duration_hours = c(5L, 7L, 3L, 2L, 5L, 9L, 2L, 3L), date = c("2021-03-29",
"2021-03-29", "2021-03-30", "2021-03-30", "2021-03-30", "2021-03-30",
"2021-03-31", "2021-04-01")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8"))