dplyr:group_by 之后汇总内的管道
dplyr: pipes inside of summarize after group_by
我有这个data.frame:
df_test = structure(list(`MAE %` = c(-0.0647202646339709, -0.126867775585001,
-1.81159420289855, -1.03092783505155, -2.0375491194877, -0.160783192796913,
-0.585827216261999, -0.052988554472234, -0.703351261894911, -0.902996305924203,
-0.767676767676768, -0.0101091791346543, -0.0134480903711673,
-0.229357798165138, -0.176407935028625, -0.627062706270627, -1.75706139769261,
-1.23024009524439, -0.257391763463569, -0.878347259688137, -0.123613523987705,
-1.65711947626841, -2.11718534838887, -0.256285931980328, -1.87152777777778,
-0.0552333609500138, -0.943983402489627, -0.541095890410959,
-0.118607409474639, -0.840453845076341), Profit = c(7260, 2160,
-7080, 3600, -8700, 6300, -540, 10680, -1880, -3560, -720, 5400,
5280, 1800, 11040, -240, -2320, 2520, 10300, -2520, 8400, -9240,
-5190, 7350, -6790, 3600, -3240, 8640, 7150, -2400)), .Names = c("MAE %",
"Profit"), row.names = c(NA, 30L), class = "data.frame")
现在我想要一些汇总统计信息,例如:
df_test %>%
group_by(win.g = Profit > 0) %>%
summarise(GroupCnt = n(),
TopMAE = filter(`MAE %` > -1) %>% sum(Profit),
BottomMAE = filter(`MAE %` <= -1) %>% sum(Profit))
因此,如果 Profit > 0 或 <= 0,我们对数据进行分组。然后我想要对 MAE % <= -1 和 MAE % > -1 的行的 Profit 的 sum()。 TopMAE、BottomMAE计算必须使用分组。
预期结果如下:
# win.g CroupCnt TopMAE BottomMAE
#1 FALSE 14 -15100 -39320
#2 TRUE 16 95360 6120
但是我的 R 代码不起作用。我有一个错误:
Error: no applicable method for 'filter_' applied to an object of class "logical"
我已经根据错误更改了我的代码:
df_test %>%
group_by(win.g = Profit > 0) %>%
summarise(UnderStop = n(),
TopMAE = filter(., `MAE %` > -1) %>% sum(Profit),
BottomMAE = filter(., `MAE %` <= -1) %>% sum(Profit))
但是结果是none。我又报错了:
Error: incorrect length (14), expecting: 16
我试图了解分组行为以及分组后如何在汇总中使用管道,但我没有成功。花上一整天。
如何获得预期结果table?请帮助我在分组和计算该组上的某些函数时理解 dplyr 逻辑。
这是您要找的吗? (只是问,因为我得到的结果与你的输出不同),
df_test %>%
group_by(win.g = Profit > 0) %>%
summarise(CroupCnt = n(), TopMAE = sum(Profit[`MAE %` > -1]),
BottomMAE = sum(Profit[`MAE %` <= -1]))
#Source: local data frame [2 x 4]
# win.g CroupCnt TopMAE BottomMAE
# (lgl) (int) (dbl) (dbl)
#1 FALSE 14 -15100 -39320
#2 TRUE 16 95360 6120
就个人而言,我更喜欢解决这样的问题,因为您要认识到您是在二维上执行分组操作,但您的代码仅使用一维。这是一个在两个维度上执行相同工作的示例。它比@Sotos 提供的代码多一点,但提供的结果与他得到的结果相同。
library(dplyr)
library(tidyr)
df_test %>%
#* Group on two dimensions
group_by(win.g = Profit > 0,
top = ifelse(`MAE %` > -1, "TopMAE", "BottomMAE")) %>%
summarise(GroupCnt = n(),
SumProfit = sum(Profit)) %>%
ungroup() %>%
#* Collapse the GroupCnt
group_by(win.g) %>%
mutate(GroupCnt = sum(GroupCnt)) %>%
ungroup() %>%
#* From long to wide
spread(top, SumProfit)
我有这个data.frame:
df_test = structure(list(`MAE %` = c(-0.0647202646339709, -0.126867775585001,
-1.81159420289855, -1.03092783505155, -2.0375491194877, -0.160783192796913,
-0.585827216261999, -0.052988554472234, -0.703351261894911, -0.902996305924203,
-0.767676767676768, -0.0101091791346543, -0.0134480903711673,
-0.229357798165138, -0.176407935028625, -0.627062706270627, -1.75706139769261,
-1.23024009524439, -0.257391763463569, -0.878347259688137, -0.123613523987705,
-1.65711947626841, -2.11718534838887, -0.256285931980328, -1.87152777777778,
-0.0552333609500138, -0.943983402489627, -0.541095890410959,
-0.118607409474639, -0.840453845076341), Profit = c(7260, 2160,
-7080, 3600, -8700, 6300, -540, 10680, -1880, -3560, -720, 5400,
5280, 1800, 11040, -240, -2320, 2520, 10300, -2520, 8400, -9240,
-5190, 7350, -6790, 3600, -3240, 8640, 7150, -2400)), .Names = c("MAE %",
"Profit"), row.names = c(NA, 30L), class = "data.frame")
现在我想要一些汇总统计信息,例如:
df_test %>%
group_by(win.g = Profit > 0) %>%
summarise(GroupCnt = n(),
TopMAE = filter(`MAE %` > -1) %>% sum(Profit),
BottomMAE = filter(`MAE %` <= -1) %>% sum(Profit))
因此,如果 Profit > 0 或 <= 0,我们对数据进行分组。然后我想要对 MAE % <= -1 和 MAE % > -1 的行的 Profit 的 sum()。 TopMAE、BottomMAE计算必须使用分组。
预期结果如下:
# win.g CroupCnt TopMAE BottomMAE
#1 FALSE 14 -15100 -39320
#2 TRUE 16 95360 6120
但是我的 R 代码不起作用。我有一个错误:
Error: no applicable method for 'filter_' applied to an object of class "logical"
我已经根据错误更改了我的代码:
df_test %>%
group_by(win.g = Profit > 0) %>%
summarise(UnderStop = n(),
TopMAE = filter(., `MAE %` > -1) %>% sum(Profit),
BottomMAE = filter(., `MAE %` <= -1) %>% sum(Profit))
但是结果是none。我又报错了:
Error: incorrect length (14), expecting: 16
我试图了解分组行为以及分组后如何在汇总中使用管道,但我没有成功。花上一整天。
如何获得预期结果table?请帮助我在分组和计算该组上的某些函数时理解 dplyr 逻辑。
这是您要找的吗? (只是问,因为我得到的结果与你的输出不同),
df_test %>%
group_by(win.g = Profit > 0) %>%
summarise(CroupCnt = n(), TopMAE = sum(Profit[`MAE %` > -1]),
BottomMAE = sum(Profit[`MAE %` <= -1]))
#Source: local data frame [2 x 4]
# win.g CroupCnt TopMAE BottomMAE
# (lgl) (int) (dbl) (dbl)
#1 FALSE 14 -15100 -39320
#2 TRUE 16 95360 6120
就个人而言,我更喜欢解决这样的问题,因为您要认识到您是在二维上执行分组操作,但您的代码仅使用一维。这是一个在两个维度上执行相同工作的示例。它比@Sotos 提供的代码多一点,但提供的结果与他得到的结果相同。
library(dplyr)
library(tidyr)
df_test %>%
#* Group on two dimensions
group_by(win.g = Profit > 0,
top = ifelse(`MAE %` > -1, "TopMAE", "BottomMAE")) %>%
summarise(GroupCnt = n(),
SumProfit = sum(Profit)) %>%
ungroup() %>%
#* Collapse the GroupCnt
group_by(win.g) %>%
mutate(GroupCnt = sum(GroupCnt)) %>%
ungroup() %>%
#* From long to wide
spread(top, SumProfit)