多组 - 加权平均 - 在 r 中不起作用(使用 dplyr)
Multiple Group - Weighted mean - not working in r (using dplyr)
我正在尝试获取某些土壤特性的加权平均值,这些特性按其表面加权但由其所有者和其他级别汇总。我的数据集看起来像这样 full_dataset
我想获得这样的平均值mean_dataset。我现在的代码是:
results<- original_df %>%
group_by(owner,cultivar) %>%
summarise(across(.cols= where(is.numeric),fns= weighted.mean(.,w= surface)))
但目前不起作用,因为输出与原始数据集的行数相同。我错过了什么?
数据:
structure(list(town = c("a", "a", "a", "a", "b", "b", "b", "b",
"b", "C", "C", "C", "C", "C", "C"), cultivar = c(1L, 1L, 2L,
2L, 1L, 2L, 3L, 4L, 4L, 3L, 3L, 2L, 2L, 1L, 1L), owner = c("A",
"A", "B", "C", "A", "B", "B", "C", "C", "B", "B", "C", "C", "A",
"A"), surface = c(456L, 446L, 462L, 120L, 204L, 250L, 642L, 466L,
580L, 258L, 146L, 617L, 79L, 304L, 48L), x1 = c(202L, 647L, 525L,
536L, 563L, 269L, 376L, 492L, 229L, 177L, 413L, 156L, 77L, 79L,
609L), x2 = c(91L, 334L, 110L, 533L, 161L, 605L, 344L, 380L,
221L, 368L, 179L, 531L, 357L, 66L, 157L), x3 = c(300L, 90L, 570L,
43L, 403L, 245L, 640L, 344L, 371L, 70L, 546L, 400L, 255L, 176L,
336L)), class = "data.frame", row.names = c(NA, -15L))
如果我们使用 .
,我们需要 lambda (~
) 即.. ~
等同于 function(.)
.
library(dplyr)
original_df %>%
group_by(owner,cultivar) %>%
dplyr::summarise(across(.cols= starts_with('x'),
~ weighted.mean(.,w= surface)), .groups = 'drop')
这也可以通过删除 ()
和 .
来实现,因为 w
已经指定为 'surface',因此它隐含地采用 x
列循环
original_df %>%
group_by(owner,cultivar) %>%
dplyr::summarise(across(.cols= starts_with('x'),
weighted.mean, w= surface), .groups = 'drop')
-输出
# A tibble: 5 × 5
owner cultivar x1 x2 x3
<chr> <int> <dbl> <dbl> <dbl>
1 A 1 376. 172. 226.
2 B 2 435. 284. 456.
3 B 3 332. 327. 486.
4 C 2 204. 514. 333.
5 C 4 346. 292. 359.
我正在尝试获取某些土壤特性的加权平均值,这些特性按其表面加权但由其所有者和其他级别汇总。我的数据集看起来像这样 full_dataset 我想获得这样的平均值mean_dataset。我现在的代码是:
results<- original_df %>%
group_by(owner,cultivar) %>%
summarise(across(.cols= where(is.numeric),fns= weighted.mean(.,w= surface)))
但目前不起作用,因为输出与原始数据集的行数相同。我错过了什么?
数据:
structure(list(town = c("a", "a", "a", "a", "b", "b", "b", "b",
"b", "C", "C", "C", "C", "C", "C"), cultivar = c(1L, 1L, 2L,
2L, 1L, 2L, 3L, 4L, 4L, 3L, 3L, 2L, 2L, 1L, 1L), owner = c("A",
"A", "B", "C", "A", "B", "B", "C", "C", "B", "B", "C", "C", "A",
"A"), surface = c(456L, 446L, 462L, 120L, 204L, 250L, 642L, 466L,
580L, 258L, 146L, 617L, 79L, 304L, 48L), x1 = c(202L, 647L, 525L,
536L, 563L, 269L, 376L, 492L, 229L, 177L, 413L, 156L, 77L, 79L,
609L), x2 = c(91L, 334L, 110L, 533L, 161L, 605L, 344L, 380L,
221L, 368L, 179L, 531L, 357L, 66L, 157L), x3 = c(300L, 90L, 570L,
43L, 403L, 245L, 640L, 344L, 371L, 70L, 546L, 400L, 255L, 176L,
336L)), class = "data.frame", row.names = c(NA, -15L))
如果我们使用 .
,我们需要 lambda (~
) 即.. ~
等同于 function(.)
.
library(dplyr)
original_df %>%
group_by(owner,cultivar) %>%
dplyr::summarise(across(.cols= starts_with('x'),
~ weighted.mean(.,w= surface)), .groups = 'drop')
这也可以通过删除 ()
和 .
来实现,因为 w
已经指定为 'surface',因此它隐含地采用 x
列循环
original_df %>%
group_by(owner,cultivar) %>%
dplyr::summarise(across(.cols= starts_with('x'),
weighted.mean, w= surface), .groups = 'drop')
-输出
# A tibble: 5 × 5
owner cultivar x1 x2 x3
<chr> <int> <dbl> <dbl> <dbl>
1 A 1 376. 172. 226.
2 B 2 435. 284. 456.
3 B 3 332. 327. 486.
4 C 2 204. 514. 333.
5 C 4 346. 292. 359.