R - ddply():使用一列的最小值在不同列中找到对应的值
R - ddply(): Using min value of one column to find the corresponding value in different column
我想获得多年来每个国家/地区特定机场的最低(成本)摘要。数据集如下所示(大约 1000 行,每个国家/地区有多个机场)
airport country cost year
ORD US 500 2010
SFO US 800 2010
LHR UK 250 2010
CDG FR 300 2010
FRA GR 200 2010
ORD US 650 2011
SFO US 500 2011
LHR UK 850 2011
CDG FR 350 2011
FRA GR 150 2011
ORD US 250 2012
SFO US 650 2012
LHR UK 350 2012
CDG FR 450 2012
FRA GR 100 2012
下面的代码让我总结了每个国家/地区的最低(成本)
ddply(df,c('country'), summarize, LowestCost = min(cost))
当我尝试显示国家/地区的最低(成本)以及特定机场时,我只列出了一个机场
ddply(df,c('country'), summarize, LowestCost = min(cost), AirportName = df[which.min(df[,3]),1])
The output should look like below
country LowestCost AirportName
US 250 ORD
UK 250 LHR
FR 300 CDG
GR 100 FRA
But instead it looks like this
country LowestCost AirportName
US 250 ORD
UK 250 ORD
FR 300 ORD
GR 100 ORD
感谢任何帮助
我们可以使用 slice_min
从 dplyr
library(dplyr)
df %>%
select(-year) %>%
group_by(country) %>%
slice_min(cost, n = 1) %>%
ungroup %>%
rename(LowestCost = cost)
-输出
# A tibble: 4 x 3
airport country LowestCost
<chr> <chr> <int>
1 CDG FR 300
2 FRA GR 100
3 LHR UK 250
4 ORD US 250
在 plyr
代码中,which.min
应用于整个列,而不是分组列。我们只需要指定列名
plyr::ddply(df, c("country"), plyr::summarise,
LowestCost = min(cost), AirportName = airport[which.min(cost)])
country LowestCost AirportName
1 FR 300 CDG
2 GR 100 FRA
3 UK 250 LHR
4 US 250 ORD
数据
df <- structure(list(airport = c("ORD", "SFO", "LHR", "CDG", "FRA",
"ORD", "SFO", "LHR", "CDG", "FRA", "ORD", "SFO", "LHR", "CDG",
"FRA"), country = c("US", "US", "UK", "FR", "GR", "US", "US",
"UK", "FR", "GR", "US", "US", "UK", "FR", "GR"), cost = c(500L,
800L, 250L, 300L, 200L, 650L, 500L, 850L, 350L, 150L, 250L, 650L,
350L, 450L, 100L), year = c(2010L, 2010L, 2010L, 2010L, 2010L,
2011L, 2011L, 2011L, 2011L, 2011L, 2012L, 2012L, 2012L, 2012L,
2012L)), class = "data.frame", row.names = c(NA, -15L))
我想获得多年来每个国家/地区特定机场的最低(成本)摘要。数据集如下所示(大约 1000 行,每个国家/地区有多个机场)
airport country cost year
ORD US 500 2010
SFO US 800 2010
LHR UK 250 2010
CDG FR 300 2010
FRA GR 200 2010
ORD US 650 2011
SFO US 500 2011
LHR UK 850 2011
CDG FR 350 2011
FRA GR 150 2011
ORD US 250 2012
SFO US 650 2012
LHR UK 350 2012
CDG FR 450 2012
FRA GR 100 2012
下面的代码让我总结了每个国家/地区的最低(成本)
ddply(df,c('country'), summarize, LowestCost = min(cost))
当我尝试显示国家/地区的最低(成本)以及特定机场时,我只列出了一个机场
ddply(df,c('country'), summarize, LowestCost = min(cost), AirportName = df[which.min(df[,3]),1])
The output should look like below
country LowestCost AirportName
US 250 ORD
UK 250 LHR
FR 300 CDG
GR 100 FRA
But instead it looks like this
country LowestCost AirportName
US 250 ORD
UK 250 ORD
FR 300 ORD
GR 100 ORD
感谢任何帮助
我们可以使用 slice_min
从 dplyr
library(dplyr)
df %>%
select(-year) %>%
group_by(country) %>%
slice_min(cost, n = 1) %>%
ungroup %>%
rename(LowestCost = cost)
-输出
# A tibble: 4 x 3
airport country LowestCost
<chr> <chr> <int>
1 CDG FR 300
2 FRA GR 100
3 LHR UK 250
4 ORD US 250
在 plyr
代码中,which.min
应用于整个列,而不是分组列。我们只需要指定列名
plyr::ddply(df, c("country"), plyr::summarise,
LowestCost = min(cost), AirportName = airport[which.min(cost)])
country LowestCost AirportName
1 FR 300 CDG
2 GR 100 FRA
3 UK 250 LHR
4 US 250 ORD
数据
df <- structure(list(airport = c("ORD", "SFO", "LHR", "CDG", "FRA",
"ORD", "SFO", "LHR", "CDG", "FRA", "ORD", "SFO", "LHR", "CDG",
"FRA"), country = c("US", "US", "UK", "FR", "GR", "US", "US",
"UK", "FR", "GR", "US", "US", "UK", "FR", "GR"), cost = c(500L,
800L, 250L, 300L, 200L, 650L, 500L, 850L, 350L, 150L, 250L, 650L,
350L, 450L, 100L), year = c(2010L, 2010L, 2010L, 2010L, 2010L,
2011L, 2011L, 2011L, 2011L, 2011L, 2012L, 2012L, 2012L, 2012L,
2012L)), class = "data.frame", row.names = c(NA, -15L))