过滤R中多列中的最小值
Filtering for minimum values in multiple columns in R
如果这个答案格式不正确,请提前道歉,我对 R 和 SO 社区还很陌生,我欢迎建设性的批评。我有一个看起来像这样的数据框,我正在尝试对其进行过滤,因此它只包含每个人的最小值 'Cars' 和 'Houses'。
my_data = data.frame("Name" = c("Dora", "Dora", "John", "John", "Marie", "Marie"),
"Cars" = c(2, 3, NA, NA, 4, 1),
"Houses" = c(NA, NA, 4, 3, 2, NA))
#Name Cars Houses
#1 Dora 2 NA
#2 Dora 3 NA
#3 John NA 4
#4 John NA 3
#5 Marie 4 2
#6 Marie 1 NA
我想以这样的方式结束(特别注意 Marie 行已更改,但如果它也拆分为 2 个单独的行也没关系):
#Name Cars Houses
#Dora 2 NA
#John NA 3
#Marie 1 2
或者像这样:
#Name Cars Houses
#Dora 2 NA
#John NA 3
#Marie NA 2
#Marie 1 NA
根据其他答案,我试过了
my_data %>%
group_by(Name) %>%
filter(Cars == min(Cars))
#Name Cars Houses
#Dora 2 NA
#Marie 1 NA
但这会导致 John 行在我可以过滤最少的房屋之前被删除。有没有人对如何处理这个问题有任何建议?提前致谢。
我们可以使用 summarise
来获取每个名称的每列的最小值:
my_data = data.frame("Name" = c("Dora", "Dora", "John", "John", "Marie", "Marie"),
"Cars" = c(2, 3, NA, NA, 4, 1),
"Houses" = c(NA, NA, 4, 3, 2, NA))
library(dplyr)
my_data %>%
group_by(Name) %>%
summarise(Cars = min(Cars, na.rm = TRUE),
Houses = min(Houses, na.rm = TRUE))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 3
Name Cars Houses
<chr> <dbl> <dbl>
1 Dora 2 Inf
2 John Inf 3
3 Marie 1 2
以下是您可以在 base R 中执行的操作:
df <- data.frame("Name" = c("Dora", "Dora", "John", "John", "Marie", "Marie"),
"Cars" = c(2, 3, NA, NA, 4, 1),
"Houses" = c(NA, NA, 4, 3, 2, NA), stringsAsFactors = FALSE)
aggregate(df, list(df$Name), FUN = function(x) min(x, na.rm = TRUE))[,-1]
输出
Name Cars Houses
1 Dora 2 Inf
2 John Inf 3
3 Marie 1 2
如果这个答案格式不正确,请提前道歉,我对 R 和 SO 社区还很陌生,我欢迎建设性的批评。我有一个看起来像这样的数据框,我正在尝试对其进行过滤,因此它只包含每个人的最小值 'Cars' 和 'Houses'。
my_data = data.frame("Name" = c("Dora", "Dora", "John", "John", "Marie", "Marie"),
"Cars" = c(2, 3, NA, NA, 4, 1),
"Houses" = c(NA, NA, 4, 3, 2, NA))
#Name Cars Houses
#1 Dora 2 NA
#2 Dora 3 NA
#3 John NA 4
#4 John NA 3
#5 Marie 4 2
#6 Marie 1 NA
我想以这样的方式结束(特别注意 Marie 行已更改,但如果它也拆分为 2 个单独的行也没关系):
#Name Cars Houses
#Dora 2 NA
#John NA 3
#Marie 1 2
或者像这样:
#Name Cars Houses
#Dora 2 NA
#John NA 3
#Marie NA 2
#Marie 1 NA
根据其他答案,我试过了
my_data %>%
group_by(Name) %>%
filter(Cars == min(Cars))
#Name Cars Houses
#Dora 2 NA
#Marie 1 NA
但这会导致 John 行在我可以过滤最少的房屋之前被删除。有没有人对如何处理这个问题有任何建议?提前致谢。
我们可以使用 summarise
来获取每个名称的每列的最小值:
my_data = data.frame("Name" = c("Dora", "Dora", "John", "John", "Marie", "Marie"),
"Cars" = c(2, 3, NA, NA, 4, 1),
"Houses" = c(NA, NA, 4, 3, 2, NA))
library(dplyr)
my_data %>%
group_by(Name) %>%
summarise(Cars = min(Cars, na.rm = TRUE),
Houses = min(Houses, na.rm = TRUE))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 3
Name Cars Houses
<chr> <dbl> <dbl>
1 Dora 2 Inf
2 John Inf 3
3 Marie 1 2
以下是您可以在 base R 中执行的操作:
df <- data.frame("Name" = c("Dora", "Dora", "John", "John", "Marie", "Marie"),
"Cars" = c(2, 3, NA, NA, 4, 1),
"Houses" = c(NA, NA, 4, 3, 2, NA), stringsAsFactors = FALSE)
aggregate(df, list(df$Name), FUN = function(x) min(x, na.rm = TRUE))[,-1]
输出
Name Cars Houses
1 Dora 2 Inf
2 John Inf 3
3 Marie 1 2