无法使用 DLPYR 找到简单的方法 - `mean()` 命令的行为不如预期
Having trouble finding simple means with DLPYR - `mean()` command is not behaving as envisioned
我正在尝试使用 dplyr
找到我组合在一起的一列值的简单平均值:
我最初的尝试是输入如下代码:
cust_id_flags_3 = customer_sleep %>% group_by(flags) %>% count(flags) %>% summarise(mean_val = mean(n))
但我得到的输出是 table
# A tibble: 27 x 2
flags mean_val
<dbl> <dbl>
1 0 1966
2 1 2555
3 2 1263
4 3 1694
5 4 1452
6 5 989
7 6 879
8 7 709
9 8 712
10 9 530
# ... with 17 more rows
我想要的是 mean_val
列中值的平均值。我可以通过手动计算得到它:
> mean_test = sum(cust_id_flags_3$mean_val)/nrow(cust_id_flags_3)
> mean_test
[1] 569.037
下面是我用来执行计算的数据集。但我知道我在应用 tidyverse
动词时做错了什么。对于上下文,我这样做是为了能够说明使用泊松回归的方法。感谢您的帮助。
> dput(cust_id_flags_3)
structure(list(flags = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26),
n = c(1966L, 2555L, 1263L, 1694L, 1452L, 989L, 879L, 709L,
712L, 530L, 526L, 435L, 398L, 334L, 233L, 174L, 145L, 114L,
86L, 61L, 36L, 25L, 21L, 13L, 3L, 7L, 4L)), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -27L), groups = structure(list(
flags = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26), .rows = structure(list(
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L,
14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L,
25L, 26L, 27L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -27L), .drop = TRUE))
你的数据已经分组了,我可以通过
复制569的平均值
library(dplyr)
df %>%
ungroup() %>%
summarise(mean_val = mean(n))
现在每组(标志)只有一个值,因此平均值始终为 value/1。如果我调整您的数据以在每组中包含更多值,则 group_by()
命令与 summarise
结合使用会按预期工作。
df <- tibble(
flags = c(
0, 1, 1, 2, 2, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26
),
n = c(
1966L, 2555L, 1263L, 1694L, 1452L, 989L, 879L, 709L,
712L, 530L, 526L, 435L, 398L, 334L, 233L, 174L, 145L, 114L,
86L, 61L, 36L, 25L, 21L, 13L, 3L, 7L, 4L
)
)
df %>%
group_by(flags) %>%
summarise(mean_val = mean(n), count = n())
count = n()
为每组的观察次数添加一个整数。
我正在尝试使用 dplyr
找到我组合在一起的一列值的简单平均值:
我最初的尝试是输入如下代码:
cust_id_flags_3 = customer_sleep %>% group_by(flags) %>% count(flags) %>% summarise(mean_val = mean(n))
但我得到的输出是 table
# A tibble: 27 x 2
flags mean_val
<dbl> <dbl>
1 0 1966
2 1 2555
3 2 1263
4 3 1694
5 4 1452
6 5 989
7 6 879
8 7 709
9 8 712
10 9 530
# ... with 17 more rows
我想要的是 mean_val
列中值的平均值。我可以通过手动计算得到它:
> mean_test = sum(cust_id_flags_3$mean_val)/nrow(cust_id_flags_3)
> mean_test
[1] 569.037
下面是我用来执行计算的数据集。但我知道我在应用 tidyverse
动词时做错了什么。对于上下文,我这样做是为了能够说明使用泊松回归的方法。感谢您的帮助。
> dput(cust_id_flags_3)
structure(list(flags = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26),
n = c(1966L, 2555L, 1263L, 1694L, 1452L, 989L, 879L, 709L,
712L, 530L, 526L, 435L, 398L, 334L, 233L, 174L, 145L, 114L,
86L, 61L, 36L, 25L, 21L, 13L, 3L, 7L, 4L)), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -27L), groups = structure(list(
flags = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26), .rows = structure(list(
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L,
14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L,
25L, 26L, 27L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -27L), .drop = TRUE))
你的数据已经分组了,我可以通过
复制569的平均值library(dplyr)
df %>%
ungroup() %>%
summarise(mean_val = mean(n))
现在每组(标志)只有一个值,因此平均值始终为 value/1。如果我调整您的数据以在每组中包含更多值,则 group_by()
命令与 summarise
结合使用会按预期工作。
df <- tibble(
flags = c(
0, 1, 1, 2, 2, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26
),
n = c(
1966L, 2555L, 1263L, 1694L, 1452L, 989L, 879L, 709L,
712L, 530L, 526L, 435L, 398L, 334L, 233L, 174L, 145L, 114L,
86L, 61L, 36L, 25L, 21L, 13L, 3L, 7L, 4L
)
)
df %>%
group_by(flags) %>%
summarise(mean_val = mean(n), count = n())
count = n()
为每组的观察次数添加一个整数。