根据 R 中的最大值(对于多列)选择重复项
Choosing duplicates according to maximum (for multiple columns) in R
所以我有一个包含多个重复项的数据集,我想创建一个数据集来选择多个值中的最大值。例如:
ID Value1 Value2 Value3 Gender Race
1 45 76 87 M B
1 34 45 95 M B
2 67 100 92 F W
2 43 70 89 F W
3 34 95 80 F A
3 22 41 90 F A
4 78 25 7 M W
4 32 37 13 M W
5 56 105 25 M B
5 80 59 45 M B
会变成这样:
ID Value1 Value2 Value3 Gender Race
1 45 76 95 M B
2 67 100 92 F W
3 34 95 90 F A
4 78 56 13 M W
5 80 105 45 M B
我感觉它与 summarize 命令有关(虽然有 40 个值变量,所以我害怕为每个变量写一行代码)或这里提供的一些解决方案(我不知道知道如何根据我的需要进行修改):Remove duplicates keeping entry with largest absolute value
感谢任何帮助!
您可以按 ID
、Gender
和 Race
分组并汇总 Value
个变量以获得它们的最大值。
library(dplyr)
df %>%
group_by(ID, Gender, Race) %>%
summarise(across(starts_with('Value'), max, na.rm = TRUE), .groups = "drop")
# ID Gender Race Value1 Value2 Value3
# <int> <chr> <chr> <int> <int> <int>
#1 1 M B 45 76 95
#2 2 F W 67 100 92
#3 3 F A 34 95 90
#4 4 M W 78 37 13
#5 5 M B 80 105 45
您可以使用aggregate
功能如下,
df <- data.frame(ID = c(1,1,2,2,3,3,4,4,5,5) ,
Value1 = c(45,34,67,43,34,22,78,32,56,80) ,
Value2 = c(76,45,100,70,95,41,25,37,105,59) ,
Value3 = c(87,95,92,89,80,90,7,13,25,45) ,
Gender = c("M","M","F","F","F","F","M","M","M","M") ,
Race = c("B","B","W","W","A","A","W","W","B","B"))
aggregate(df , by = list(df$ID) , max)
#> Group.1 ID Value1 Value2 Value3 Gender Race
#> 1 1 1 45 76 95 M B
#> 2 2 2 67 100 92 F W
#> 3 3 3 34 95 90 F A
#> 4 4 4 78 37 13 M W
#> 5 5 5 80 105 45 M B
由 reprex package (v2.0.1)
于 2022-05-30 创建
所以我有一个包含多个重复项的数据集,我想创建一个数据集来选择多个值中的最大值。例如:
ID Value1 Value2 Value3 Gender Race
1 45 76 87 M B
1 34 45 95 M B
2 67 100 92 F W
2 43 70 89 F W
3 34 95 80 F A
3 22 41 90 F A
4 78 25 7 M W
4 32 37 13 M W
5 56 105 25 M B
5 80 59 45 M B
会变成这样:
ID Value1 Value2 Value3 Gender Race
1 45 76 95 M B
2 67 100 92 F W
3 34 95 90 F A
4 78 56 13 M W
5 80 105 45 M B
我感觉它与 summarize 命令有关(虽然有 40 个值变量,所以我害怕为每个变量写一行代码)或这里提供的一些解决方案(我不知道知道如何根据我的需要进行修改):Remove duplicates keeping entry with largest absolute value
感谢任何帮助!
您可以按 ID
、Gender
和 Race
分组并汇总 Value
个变量以获得它们的最大值。
library(dplyr)
df %>%
group_by(ID, Gender, Race) %>%
summarise(across(starts_with('Value'), max, na.rm = TRUE), .groups = "drop")
# ID Gender Race Value1 Value2 Value3
# <int> <chr> <chr> <int> <int> <int>
#1 1 M B 45 76 95
#2 2 F W 67 100 92
#3 3 F A 34 95 90
#4 4 M W 78 37 13
#5 5 M B 80 105 45
您可以使用aggregate
功能如下,
df <- data.frame(ID = c(1,1,2,2,3,3,4,4,5,5) ,
Value1 = c(45,34,67,43,34,22,78,32,56,80) ,
Value2 = c(76,45,100,70,95,41,25,37,105,59) ,
Value3 = c(87,95,92,89,80,90,7,13,25,45) ,
Gender = c("M","M","F","F","F","F","M","M","M","M") ,
Race = c("B","B","W","W","A","A","W","W","B","B"))
aggregate(df , by = list(df$ID) , max)
#> Group.1 ID Value1 Value2 Value3 Gender Race
#> 1 1 1 45 76 95 M B
#> 2 2 2 67 100 92 F W
#> 3 3 3 34 95 90 F A
#> 4 4 4 78 37 13 M W
#> 5 5 5 80 105 45 M B
由 reprex package (v2.0.1)
于 2022-05-30 创建