根据 R 中的最大值(对于多列)选择重复项

Choosing duplicates according to maximum (for multiple columns) in R

所以我有一个包含多个重复项的数据集,我想创建一个数据集来选择多个值中的最大值。例如:

  ID         Value1   Value2 Value3  Gender  Race
   1          45     76      87        M      B   
   1          34     45      95        M      B
   2          67     100     92        F      W
   2          43     70      89        F      W
   3          34     95      80        F      A
   3          22     41      90        F      A
   4          78     25       7        M      W
   4          32     37      13        M      W
   5          56     105     25        M      B
   5          80     59      45        M      B

会变成这样:

  ID         Value1   Value2 Value3  Gender  Race
   1          45     76      95        M      B   
   2          67     100     92        F      W
   3          34     95      90        F      A
   4          78     56      13        M      W
   5          80     105     45        M      B

我感觉它与 summarize 命令有关(虽然有 40 个值变量,所以我害怕为每个变量写一行代码)或这里提供的一些解决方案(我不知道知道如何根据我的需要进行修改):Remove duplicates keeping entry with largest absolute value

感谢任何帮助!

您可以按 IDGenderRace 分组并汇总 Value 个变量以获得它们的最大值。

library(dplyr)

df %>%
  group_by(ID, Gender, Race) %>%
  summarise(across(starts_with('Value'), max, na.rm = TRUE), .groups = "drop")

#     ID Gender Race  Value1 Value2 Value3
#  <int> <chr>  <chr>  <int>  <int>  <int>
#1     1 M      B         45     76     95
#2     2 F      W         67    100     92
#3     3 F      A         34     95     90
#4     4 M      W         78     37     13
#5     5 M      B         80    105     45

您可以使用aggregate功能如下,

df <- data.frame(ID = c(1,1,2,2,3,3,4,4,5,5) , 
                 Value1 = c(45,34,67,43,34,22,78,32,56,80) , 
                 Value2 = c(76,45,100,70,95,41,25,37,105,59) ,
                 Value3 = c(87,95,92,89,80,90,7,13,25,45) ,
                 Gender = c("M","M","F","F","F","F","M","M","M","M") ,
                 Race = c("B","B","W","W","A","A","W","W","B","B"))

aggregate(df , by = list(df$ID) , max)
#>   Group.1 ID Value1 Value2 Value3 Gender Race
#> 1       1  1     45     76     95      M    B
#> 2       2  2     67    100     92      F    W
#> 3       3  3     34     95     90      F    A
#> 4       4  4     78     37     13      M    W
#> 5       5  5     80    105     45      M    B

reprex package (v2.0.1)

于 2022-05-30 创建