基于数据帧R中排名列的数据帧子集列表
Subsetting list of dataframes based on ranked column in dataframes R
我有一个数据帧列表。我只想对包含分数比第二个排名分数低 10 倍的行的数据帧进行子集化,删除所有其他数据帧。知道如何处理这个吗?谢谢!
>Output
$E1
ID model score
E1 AAA 2
E1 BBB 100
E1 CCC 130
E1 ZZZ 120
E1 YYY 128
$E2
ID model score
E2 XXX 130
E2 ASD 144
E2 DFE 142
E2 FGS 145
E2 GFH 124
首选结果:
>Output_subset
$E1
ID model score
E1 AAA 2
E1 BBB 100
E1 CCC 130
E1 ZZZ 120
E1 YYY 128
您可以编写一个函数来检查两个分数之间的条件:
check_data <- function(df) {
x <- sort(df$score)
x[1] < (x[2]/10)
}
您可以在 Filter
中使用此函数 in base R :
Filter(check_data, Output)
#$E1
# ID model score
#1 E1 AAA 2
#2 E1 BBB 100
#3 E1 CCC 130
#4 E1 ZZZ 120
#5 E1 YYY 128
或 keep
在 purrr
中:
purrr::keep(Output, check_data)
数据
Output <- list(E1 = structure(list(ID = c("E1", "E1", "E1", "E1", "E1"),
model = c("AAA", "BBB", "CCC", "ZZZ", "YYY"), score = c(2L,
100L, 130L, 120L, 128L)), class = "data.frame", row.names = c(NA,
-5L)), E2 = structure(list(ID = c("E2", "E2", "E2", "E2", "E2"
), model = c("XXX", "ASD", "DFE", "FGS", "GFH"), score = c(130L,
144L, 142L, 145L, 124L)), class = "data.frame", row.names = c(NA, -5L)))
我们还可以使用 base R
中的 sapply
Output[sapply(Output, function(x)
with(head(x[order(x$score), ], 2), score[1] < (score[2]/10)))]
数据
Output <- list(E1 = structure(list(ID = c("E1", "E1", "E1", "E1", "E1"),
model = c("AAA", "BBB", "CCC", "ZZZ", "YYY"), score = c(2L,
100L, 130L, 120L, 128L)), class = "data.frame", row.names = c(NA,
-5L)), E2 = structure(list(ID = c("E2", "E2", "E2", "E2", "E2"
), model = c("XXX", "ASD", "DFE", "FGS", "GFH"), score = c(130L,
144L, 142L, 145L, 124L)), class = "data.frame", row.names = c(NA, -5L)))
我有一个数据帧列表。我只想对包含分数比第二个排名分数低 10 倍的行的数据帧进行子集化,删除所有其他数据帧。知道如何处理这个吗?谢谢!
>Output
$E1
ID model score
E1 AAA 2
E1 BBB 100
E1 CCC 130
E1 ZZZ 120
E1 YYY 128
$E2
ID model score
E2 XXX 130
E2 ASD 144
E2 DFE 142
E2 FGS 145
E2 GFH 124
首选结果:
>Output_subset
$E1
ID model score
E1 AAA 2
E1 BBB 100
E1 CCC 130
E1 ZZZ 120
E1 YYY 128
您可以编写一个函数来检查两个分数之间的条件:
check_data <- function(df) {
x <- sort(df$score)
x[1] < (x[2]/10)
}
您可以在 Filter
中使用此函数 in base R :
Filter(check_data, Output)
#$E1
# ID model score
#1 E1 AAA 2
#2 E1 BBB 100
#3 E1 CCC 130
#4 E1 ZZZ 120
#5 E1 YYY 128
或 keep
在 purrr
中:
purrr::keep(Output, check_data)
数据
Output <- list(E1 = structure(list(ID = c("E1", "E1", "E1", "E1", "E1"),
model = c("AAA", "BBB", "CCC", "ZZZ", "YYY"), score = c(2L,
100L, 130L, 120L, 128L)), class = "data.frame", row.names = c(NA,
-5L)), E2 = structure(list(ID = c("E2", "E2", "E2", "E2", "E2"
), model = c("XXX", "ASD", "DFE", "FGS", "GFH"), score = c(130L,
144L, 142L, 145L, 124L)), class = "data.frame", row.names = c(NA, -5L)))
我们还可以使用 base R
sapply
Output[sapply(Output, function(x)
with(head(x[order(x$score), ], 2), score[1] < (score[2]/10)))]
数据
Output <- list(E1 = structure(list(ID = c("E1", "E1", "E1", "E1", "E1"),
model = c("AAA", "BBB", "CCC", "ZZZ", "YYY"), score = c(2L,
100L, 130L, 120L, 128L)), class = "data.frame", row.names = c(NA,
-5L)), E2 = structure(list(ID = c("E2", "E2", "E2", "E2", "E2"
), model = c("XXX", "ASD", "DFE", "FGS", "GFH"), score = c(130L,
144L, 142L, 145L, 124L)), class = "data.frame", row.names = c(NA, -5L)))