无法在 R 中创建分组汇总数据集
Unable to create a grouped summary dataset in R
我在创建分组汇总统计时遇到问题。
下面是我用来创建这个汇总数据集的代码
library(dplyr)
#sample dataset
D A B C VAL PD
Agriculture Services Bought with Cash 01OCT2014 10 0.4435714
Agriculture Grain Bought with Cash 01OCT2014 10 0.7266667
Agriculture Livestock Bought with Cash 01OCT2014 10 1.1372414
Agriculture Fr, ve Bought with Cash 01OCT2014 10 1.5170370
Agriculture Livestock Financed 01OCT2014 76 1.1372414
Agriculture Fr, ve Financed 01OCT2014 76 1.5170370
Agriculture Grain Financed 01OCT2014 76 0.7266667
Agriculture Services Financed 01OCT2014 76 0.4435714
Agriculture Services Insurance 01OCT2014 10 0.4435714
Agriculture Livestock Insurance 01OCT2014 10 1.1372414
groupDF<-select.other %>%
group_by(.dots=c("A","B","C")) %>%
summarize(PD=mean(PD),VAL=mean(VAL))
我希望数据集具有按 A、B 和 C 分组的平均 PD 和平均 VAL
A B C PD VAL
Services Bought with Cash 01OCT2017 1 10
相反,我得到
PD VAL
0.8574816 6059877
如有任何帮助或指导,我们将不胜感激。
如果是字符串我们可以用group_by_at
library(dplyr)
select.other %>%
group_by_at(vars(c("A","B","C"))) %>%
summarize(PD=mean(PD),VAL=mean(VAL))
# A tibble: 10 x 5
# Groups: A, B [10]
# A B C PD VAL
# <chr> <chr> <chr> <dbl> <dbl>
# 1 Fr, ve Bought with Cash 01OCT2014 1.52 10
# 2 Fr, ve Financed 01OCT2014 1.52 76
# 3 Grain Bought with Cash 01OCT2014 0.727 10
# 4 Grain Financed 01OCT2014 0.727 76
# 5 Livestock Bought with Cash 01OCT2014 1.14 10
# 6 Livestock Financed 01OCT2014 1.14 76
# 7 Livestock Insurance 01OCT2014 1.14 10
# 8 Services Bought with Cash 01OCT2014 0.444 10
# 9 Services Financed 01OCT2014 0.444 76
#10 Services Insurance 01OCT2014 0.444 10
或者另一种选择是转换为 sym
bols 然后进行评估 (!!!
)
select.other %>%
group_by(!!! rlang::syms(c("A","B","C"))) %>%
summarize(PD=mean(PD),VAL=mean(VAL))
数据
select.other <- structure(list(D = c("Agriculture", "Agriculture", "Agriculture",
"Agriculture", "Agriculture", "Agriculture", "Agriculture", "Agriculture",
"Agriculture", "Agriculture"), A = c("Services", "Grain", "Livestock",
"Fr, ve", "Livestock", "Fr, ve", "Grain", "Services", "Services",
"Livestock"), B = c("Bought with Cash", "Bought with Cash", "Bought with Cash",
"Bought with Cash", "Financed", "Financed", "Financed", "Financed",
"Insurance", "Insurance"), C = c("01OCT2014", "01OCT2014", "01OCT2014",
"01OCT2014", "01OCT2014", "01OCT2014", "01OCT2014", "01OCT2014",
"01OCT2014", "01OCT2014"), VAL = c(10L, 10L, 10L, 10L, 76L, 76L,
76L, 76L, 10L, 10L), PD = c(0.4435714, 0.7266667, 1.1372414,
1.517037, 1.1372414, 1.517037, 0.7266667, 0.4435714, 0.4435714,
1.1372414)), class = "data.frame", row.names = c(NA, -10L))
我在创建分组汇总统计时遇到问题。
下面是我用来创建这个汇总数据集的代码
library(dplyr)
#sample dataset
D A B C VAL PD
Agriculture Services Bought with Cash 01OCT2014 10 0.4435714
Agriculture Grain Bought with Cash 01OCT2014 10 0.7266667
Agriculture Livestock Bought with Cash 01OCT2014 10 1.1372414
Agriculture Fr, ve Bought with Cash 01OCT2014 10 1.5170370
Agriculture Livestock Financed 01OCT2014 76 1.1372414
Agriculture Fr, ve Financed 01OCT2014 76 1.5170370
Agriculture Grain Financed 01OCT2014 76 0.7266667
Agriculture Services Financed 01OCT2014 76 0.4435714
Agriculture Services Insurance 01OCT2014 10 0.4435714
Agriculture Livestock Insurance 01OCT2014 10 1.1372414
groupDF<-select.other %>%
group_by(.dots=c("A","B","C")) %>%
summarize(PD=mean(PD),VAL=mean(VAL))
我希望数据集具有按 A、B 和 C 分组的平均 PD 和平均 VAL
A B C PD VAL
Services Bought with Cash 01OCT2017 1 10
相反,我得到
PD VAL
0.8574816 6059877
如有任何帮助或指导,我们将不胜感激。
如果是字符串我们可以用group_by_at
library(dplyr)
select.other %>%
group_by_at(vars(c("A","B","C"))) %>%
summarize(PD=mean(PD),VAL=mean(VAL))
# A tibble: 10 x 5
# Groups: A, B [10]
# A B C PD VAL
# <chr> <chr> <chr> <dbl> <dbl>
# 1 Fr, ve Bought with Cash 01OCT2014 1.52 10
# 2 Fr, ve Financed 01OCT2014 1.52 76
# 3 Grain Bought with Cash 01OCT2014 0.727 10
# 4 Grain Financed 01OCT2014 0.727 76
# 5 Livestock Bought with Cash 01OCT2014 1.14 10
# 6 Livestock Financed 01OCT2014 1.14 76
# 7 Livestock Insurance 01OCT2014 1.14 10
# 8 Services Bought with Cash 01OCT2014 0.444 10
# 9 Services Financed 01OCT2014 0.444 76
#10 Services Insurance 01OCT2014 0.444 10
或者另一种选择是转换为 sym
bols 然后进行评估 (!!!
)
select.other %>%
group_by(!!! rlang::syms(c("A","B","C"))) %>%
summarize(PD=mean(PD),VAL=mean(VAL))
数据
select.other <- structure(list(D = c("Agriculture", "Agriculture", "Agriculture",
"Agriculture", "Agriculture", "Agriculture", "Agriculture", "Agriculture",
"Agriculture", "Agriculture"), A = c("Services", "Grain", "Livestock",
"Fr, ve", "Livestock", "Fr, ve", "Grain", "Services", "Services",
"Livestock"), B = c("Bought with Cash", "Bought with Cash", "Bought with Cash",
"Bought with Cash", "Financed", "Financed", "Financed", "Financed",
"Insurance", "Insurance"), C = c("01OCT2014", "01OCT2014", "01OCT2014",
"01OCT2014", "01OCT2014", "01OCT2014", "01OCT2014", "01OCT2014",
"01OCT2014", "01OCT2014"), VAL = c(10L, 10L, 10L, 10L, 76L, 76L,
76L, 76L, 10L, 10L), PD = c(0.4435714, 0.7266667, 1.1372414,
1.517037, 1.1372414, 1.517037, 0.7266667, 0.4435714, 0.4435714,
1.1372414)), class = "data.frame", row.names = c(NA, -10L))