如何汇总加权数据
How to summarise weighted data
是否可以使用 dplyr
:
的权重
summarise
函数?
让我们假设我想计算一个加权 table
dta = structure(list(PHHWT14 = c(530, 457, 416, 497, 395, 480, 383,
420, 499, 424, 504, 497, 449, 406, 492, 470, 418, 407, 403, 362,
393, 368, 423, 448, 511, 511, 423, 470, 453, 429, 439, 425, 431,
443, 480, 452, 472, 406, 460, 436, 574, 456, 399, 476, 423, 501,
399, 459, 396, 409, 423, 399, 383, 433, 436, 413, 403, 414, 410,
337, 472, 448, 487, 442, 475, 410, 478, 483, 374, 414, 514, 422,
409, 455, 464, 362, 461, 356, 464, 456, 494, 348, 464, 432, 398,
426, 418, 429, 516, 363, 455, 413, 388, 508, 381, 439, 330, 385,
393, 454), SEX = structure(c(2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L,
2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L,
2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L,
2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L,
2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("Female", "Male"), class = "factor")), row.names = c(NA, 100L), class = "data.frame", .Names = c("PHHWT14", "SEX"))
使用 xtabs:
xtabs(PHHWT14 ~ SEX, dta)
我会得到:
SEX
Female Male
10115 33490
有没有办法使用权重汇总?
dta %>%
group_by(SEX) %>%
summarise(n())
dta %>% group_by(SEX) %>%
summarise(sum(PHHWT14))
# SEX sum(PHHWT14)
# 1 Female 10115
# 2 Male 33490
您也可以使用summarise_each
。对于您的示例,它与 summarise
版本相同,但如果您有其他列想要总结,这将非常有帮助。
dta %>%
group_by(SEX) %>%
summarise_each(funs(sum))
## Source: local data frame [2 x 2]
##
## SEX PHHWT14
## 1 Female 10115
## 2 Male 33490
你的意思是按变量分组,但你也可以按权重调整。
一般来说,如果您有一个数字权重变量或总收入因子,您可以使用点向 sum() 函数添加额外的参数:
使用 dplyr 对 iris df 进行尝试:
library(dplyr)
set.seed(1234)
df <- iris
df[,"weights"] <- rnorm(nrow(df),1,0.1 ) # generate randomized weights
head(df)
df %>%
group_by(Species) %>%
summarise_each(funs(sum(. * weights , na.rm = TRUE), # Weighted Sum
weighted.mean(.,w = weights, na.rm = TRUE))) -> agg.df # Weighted Mean
agg.df
是否可以使用 dplyr
:
summarise
函数?
让我们假设我想计算一个加权 table
dta = structure(list(PHHWT14 = c(530, 457, 416, 497, 395, 480, 383,
420, 499, 424, 504, 497, 449, 406, 492, 470, 418, 407, 403, 362,
393, 368, 423, 448, 511, 511, 423, 470, 453, 429, 439, 425, 431,
443, 480, 452, 472, 406, 460, 436, 574, 456, 399, 476, 423, 501,
399, 459, 396, 409, 423, 399, 383, 433, 436, 413, 403, 414, 410,
337, 472, 448, 487, 442, 475, 410, 478, 483, 374, 414, 514, 422,
409, 455, 464, 362, 461, 356, 464, 456, 494, 348, 464, 432, 398,
426, 418, 429, 516, 363, 455, 413, 388, 508, 381, 439, 330, 385,
393, 454), SEX = structure(c(2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L,
2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L,
2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L,
2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L,
2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("Female", "Male"), class = "factor")), row.names = c(NA, 100L), class = "data.frame", .Names = c("PHHWT14", "SEX"))
使用 xtabs:
xtabs(PHHWT14 ~ SEX, dta)
我会得到:
SEX
Female Male
10115 33490
有没有办法使用权重汇总?
dta %>%
group_by(SEX) %>%
summarise(n())
dta %>% group_by(SEX) %>%
summarise(sum(PHHWT14))
# SEX sum(PHHWT14)
# 1 Female 10115
# 2 Male 33490
您也可以使用summarise_each
。对于您的示例,它与 summarise
版本相同,但如果您有其他列想要总结,这将非常有帮助。
dta %>%
group_by(SEX) %>%
summarise_each(funs(sum))
## Source: local data frame [2 x 2]
##
## SEX PHHWT14
## 1 Female 10115
## 2 Male 33490
你的意思是按变量分组,但你也可以按权重调整。
一般来说,如果您有一个数字权重变量或总收入因子,您可以使用点向 sum() 函数添加额外的参数: 使用 dplyr 对 iris df 进行尝试:
library(dplyr)
set.seed(1234)
df <- iris
df[,"weights"] <- rnorm(nrow(df),1,0.1 ) # generate randomized weights
head(df)
df %>%
group_by(Species) %>%
summarise_each(funs(sum(. * weights , na.rm = TRUE), # Weighted Sum
weighted.mean(.,w = weights, na.rm = TRUE))) -> agg.df # Weighted Mean
agg.df