无法使用 DLPYR 找到简单的方法 - `mean()` 命令的行为不如预期

Question

我正在尝试使用 dplyr 找到我组合在一起的一列值的简单平均值：

我最初的尝试是输入如下代码：

cust_id_flags_3 = customer_sleep %>% group_by(flags) %>% count(flags) %>% summarise(mean_val = mean(n))

但我得到的输出是 table

# A tibble: 27 x 2
   flags mean_val
   <dbl>    <dbl>
 1     0     1966
 2     1     2555
 3     2     1263
 4     3     1694
 5     4     1452
 6     5      989
 7     6      879
 8     7      709
 9     8      712
10     9      530
# ... with 17 more rows

我想要的是 mean_val 列中值的平均值。我可以通过手动计算得到它：

> mean_test = sum(cust_id_flags_3$mean_val)/nrow(cust_id_flags_3) 
> mean_test
[1] 569.037

下面是我用来执行计算的数据集。但我知道我在应用 tidyverse 动词时做错了什么。对于上下文，我这样做是为了能够说明使用泊松回归的方法。感谢您的帮助。

> dput(cust_id_flags_3)
structure(list(flags = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 
12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26), 
    n = c(1966L, 2555L, 1263L, 1694L, 1452L, 989L, 879L, 709L, 
    712L, 530L, 526L, 435L, 398L, 334L, 233L, 174L, 145L, 114L, 
    86L, 61L, 36L, 25L, 21L, 13L, 3L, 7L, 4L)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -27L), groups = structure(list(
    flags = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 
    15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26), .rows = structure(list(
        1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 
        14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 
        25L, 26L, 27L), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -27L), .drop = TRUE))

Answer 1

你的数据已经分组了，我可以通过

复制569的平均值

library(dplyr)
df %>% 
  ungroup() %>%
  summarise(mean_val = mean(n))

现在每组（标志）只有一个值，因此平均值始终为 value/1。如果我调整您的数据以在每组中包含更多值，则 group_by() 命令与 summarise 结合使用会按预期工作。

df <- tibble(
  flags = c(
    0, 1, 1, 2, 2, 5, 6, 7, 8, 9, 10, 11,
    12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26
  ),
  n = c(
    1966L, 2555L, 1263L, 1694L, 1452L, 989L, 879L, 709L,
    712L, 530L, 526L, 435L, 398L, 334L, 233L, 174L, 145L, 114L,
    86L, 61L, 36L, 25L, 21L, 13L, 3L, 7L, 4L
  )
)
df %>% 
  group_by(flags) %>%
  summarise(mean_val = mean(n), count = n())

count = n() 为每组的观察次数添加一个整数。

无法使用 DLPYR 找到简单的方法 - `mean()` 命令的行为不如预期

Having trouble finding simple means with DLPYR - `mean()` command is not behaving as envisioned

r

mean

dplyr