过滤R中多列中的最小值

Filtering for minimum values in multiple columns in R

如果这个答案格式不正确,请提前道歉,我对 R 和 SO 社区还很陌生,我欢迎建设性的批评。我有一个看起来像这样的数据框,我正在尝试对其进行过滤,因此它只包含每个人的最小值 'Cars' 和 'Houses'。

my_data = data.frame("Name" = c("Dora", "Dora", "John", "John", "Marie", "Marie"), 
"Cars" = c(2, 3, NA, NA, 4, 1), 
"Houses" = c(NA, NA, 4, 3, 2, NA))
#Name   Cars   Houses
#1  Dora    2     NA
#2  Dora    3     NA
#3  John   NA      4
#4  John   NA      3
#5 Marie    4     2
#6 Marie    1     NA

我想以这样的方式结束(特别注意 Marie 行已更改,但如果它也拆分为 2 个单独的行也没关系):

#Name   Cars   Houses
#Dora    2     NA
#John   NA     3
#Marie   1     2

或者像这样:

#Name   Cars   Houses
#Dora    2     NA
#John   NA      3
#Marie   NA     2
#Marie    1     NA

根据其他答案,我试过了

my_data %>%
group_by(Name) %>%
filter(Cars == min(Cars))
#Name   Cars    Houses
#Dora   2       NA
#Marie  1       NA

但这会导致 John 行在我可以过滤最少的房屋之前被删除。有没有人对如何处理这个问题有任何建议?提前致谢。

我们可以使用 summarise 来获取每个名称的每列的最小值:

my_data = data.frame("Name" = c("Dora", "Dora", "John", "John", "Marie", "Marie"), 
"Cars" = c(2, 3, NA, NA, 4, 1), 
"Houses" = c(NA, NA, 4, 3, 2, NA))

library(dplyr)
my_data %>% 
  group_by(Name) %>% 
  summarise(Cars = min(Cars, na.rm = TRUE),
            Houses = min(Houses, na.rm = TRUE))

`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 3
  Name   Cars Houses
  <chr> <dbl>  <dbl>
1 Dora      2    Inf
2 John    Inf      3
3 Marie     1      2

以下是您可以在 base R 中执行的操作:

df <- data.frame("Name" = c("Dora", "Dora", "John", "John", "Marie", "Marie"), 
                     "Cars" = c(2, 3, NA, NA, 4, 1), 
                     "Houses" = c(NA, NA, 4, 3, 2, NA), stringsAsFactors = FALSE)

aggregate(df, list(df$Name), FUN = function(x) min(x, na.rm = TRUE))[,-1]

输出

   Name Cars Houses
1  Dora    2     Inf
2  John   Inf      3
3 Marie    1       2