当条件满足 R 时从数据帧列表中提取数据帧名称和列名称
Extract dataframe name and column name from a list of dataframes when condition met R
我有一个数据帧列表,下面是一小段摘录
df <- list(Al2O3 = structure(list(Determination_No = c(1, 2, 3, 4,
5, 6, 7, 8, 9, 10), `2` = c(2.04, 2.07, 2.05, 2.07, 2.1, 2.08,
NA, NA, NA, NA), `3` = c(2.08, 2.1, 2.08, 2.13, 2.1, 2.08, NA,
NA, NA, NA), `4` = c(2.08, 2.08, 2.09, 2.06, 2.08, 2.07, 2.07,
2.06, 2.08, 2.08), `5` = c(2.11, 2.09, 2.1, 2.08, 2.09, 2.09,
NA, NA, NA, NA), `7` = c(2.06, 2.05, 2.04, 2.05, 2.04, 2.03,
NA, NA, NA, NA), `8` = c(2.078, 2.065, 2.057, 2.063, 2.067, 2.066,
NA, NA, NA, NA), `10` = c(2.191776681, 2.153987428, 2.153987428,
2.097303548, 2.116198175, 2.116198175, NA, NA, NA, NA), `12` = c(2.24,
2.08, 2.12, 2.15, 2.15, 2.15, NA, NA, NA, NA), `36` = c(2.07,
2.082, 2.048, 2.046, 2.086, 2.069, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-10L)), As = structure(list(Determination_No = c(1, 2, 3, 4,
5, 6, 7, 8, 9, 10), `2` = c(0.002, 0.001, 0.001, 0.001, 0.002,
0.001, NA, NA, NA, NA), `3` = c(0.003, 0.002, 0.002, 0.002, 0.001,
0.002, NA, NA, NA, NA), `4` = c(0.001, 0.002, 0.001, 0.002, 0.002,
0.002, 0.001, 0.002, 0.002, 0.003), `5` = c(0.002, 0.001, 0.001,
0.001, 0.001, 0.002, NA, NA, NA, NA), `7` = c(NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_), `8` = c(NA, 0.001, NA, NA, NA, NA, NA, NA, NA, NA),
`10` = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), `12` = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_), `36` = c(0.0053, 0.0053, 0.0053,
0.00454, 0.0053, 0.0053, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-10L)), Ba = structure(list(Determination_No = c(1, 2, 3, 4,
5, 6, 7, 8, 9, 10), `2` = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_),
`3` = c(NA, NA, NA, NA, 0.001, NA, NA, NA, NA, NA), `4` = c(0.004,
0.003, 0.003, 0.004, 0.003, 0.002, 0.004, 0.002, 0.005, NA
), `5` = c(NA, NA, NA, NA, NA, 0.003, NA, NA, NA, NA), `7` = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_), `8` = c(0.002, 0.003, NA,
NA, NA, 0.002, NA, NA, NA, NA), `10` = c(NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), `12` = c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_), `36` = c(0.00089566, 0.00089566, 0.00089566, 0.00089566,
0.00089566, 0.00089566, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-10L)))
我有以下函数可以计算 M 分数(修改后的 zscore)。对于数据帧列表中每个数据帧中的每一列,returns 列的每个数据帧中的位置。
library(outliers)
scores_na <- function(x, ...) {
not_na <- !is.na(x)
scores <- rep(NA, length(x))
scores[not_na] <- outliers::scores(na.omit(x), ...)
scores
}
MscoreMax <- 3
Mscore <- function(x,...){
labmedians <- mapply(median, x[-1], na.rm = T)
median_of_median <- median(labmedians, na.rm = T)
labMScore <- as.vector(abs(scores_na(labmedians, "mad"))) #calculate mscore by lab
labMScore [is.infinite(labMScore )] <- 0 # make infinity zero
labMScore [is.nan(labMScore )] <- 0 # make NA zero
labMScoreIndex <- which(labMScore > MscoreMax) #get the position in the vector that exceeds Mscoremax
return(labMScoreIndex)
}
Mindex <- lapply(df, Mscore) #Get the dataframe and column index for Mscore > 3
我想修改我的函数,以便它创建一个包含数据框名称(来自数据框列表)和列名的数据框。这些代表 Element/compound 和需要存储到 运行 附加统计数据的实验室 ID。
我希望最终输出类似于下面的示例
Analyte Lab_ID
AL2O3 10
AL2O3 12
As 36
哪个returns
$Al2O3
[1] 7 8
$As
[1] 9
$Ba
integer(0)
感谢任何帮助。
您可以在 labMScore > MscoreMax
中提取 names
并使用 stack
将命名列表更改为数据框。
Mscore <- function(x,...){
labmedians <- mapply(median, x[-1], na.rm = T)
median_of_median <- median(labmedians, na.rm = T)
labMScore <- as.vector(abs(scores_na(labmedians, "mad")))
labMScore [is.infinite(labMScore )] <- 0
labMScore [is.nan(labMScore )] <- 0
#Added + 1 because we ignore the 1st column while calculating median
labMScoreIndex <- names(x)[which(labMScore > MscoreMax) + 1]
return(labMScoreIndex)
}
stack(lapply(df, Mscore))[2:1]
# ind values
#1 Al2O3 10
#2 Al2O3 12
#3 As 36
我有一个数据帧列表,下面是一小段摘录
df <- list(Al2O3 = structure(list(Determination_No = c(1, 2, 3, 4,
5, 6, 7, 8, 9, 10), `2` = c(2.04, 2.07, 2.05, 2.07, 2.1, 2.08,
NA, NA, NA, NA), `3` = c(2.08, 2.1, 2.08, 2.13, 2.1, 2.08, NA,
NA, NA, NA), `4` = c(2.08, 2.08, 2.09, 2.06, 2.08, 2.07, 2.07,
2.06, 2.08, 2.08), `5` = c(2.11, 2.09, 2.1, 2.08, 2.09, 2.09,
NA, NA, NA, NA), `7` = c(2.06, 2.05, 2.04, 2.05, 2.04, 2.03,
NA, NA, NA, NA), `8` = c(2.078, 2.065, 2.057, 2.063, 2.067, 2.066,
NA, NA, NA, NA), `10` = c(2.191776681, 2.153987428, 2.153987428,
2.097303548, 2.116198175, 2.116198175, NA, NA, NA, NA), `12` = c(2.24,
2.08, 2.12, 2.15, 2.15, 2.15, NA, NA, NA, NA), `36` = c(2.07,
2.082, 2.048, 2.046, 2.086, 2.069, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-10L)), As = structure(list(Determination_No = c(1, 2, 3, 4,
5, 6, 7, 8, 9, 10), `2` = c(0.002, 0.001, 0.001, 0.001, 0.002,
0.001, NA, NA, NA, NA), `3` = c(0.003, 0.002, 0.002, 0.002, 0.001,
0.002, NA, NA, NA, NA), `4` = c(0.001, 0.002, 0.001, 0.002, 0.002,
0.002, 0.001, 0.002, 0.002, 0.003), `5` = c(0.002, 0.001, 0.001,
0.001, 0.001, 0.002, NA, NA, NA, NA), `7` = c(NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_), `8` = c(NA, 0.001, NA, NA, NA, NA, NA, NA, NA, NA),
`10` = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), `12` = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_), `36` = c(0.0053, 0.0053, 0.0053,
0.00454, 0.0053, 0.0053, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-10L)), Ba = structure(list(Determination_No = c(1, 2, 3, 4,
5, 6, 7, 8, 9, 10), `2` = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_),
`3` = c(NA, NA, NA, NA, 0.001, NA, NA, NA, NA, NA), `4` = c(0.004,
0.003, 0.003, 0.004, 0.003, 0.002, 0.004, 0.002, 0.005, NA
), `5` = c(NA, NA, NA, NA, NA, 0.003, NA, NA, NA, NA), `7` = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_), `8` = c(0.002, 0.003, NA,
NA, NA, 0.002, NA, NA, NA, NA), `10` = c(NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), `12` = c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_), `36` = c(0.00089566, 0.00089566, 0.00089566, 0.00089566,
0.00089566, 0.00089566, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-10L)))
我有以下函数可以计算 M 分数(修改后的 zscore)。对于数据帧列表中每个数据帧中的每一列,returns 列的每个数据帧中的位置。
library(outliers)
scores_na <- function(x, ...) {
not_na <- !is.na(x)
scores <- rep(NA, length(x))
scores[not_na] <- outliers::scores(na.omit(x), ...)
scores
}
MscoreMax <- 3
Mscore <- function(x,...){
labmedians <- mapply(median, x[-1], na.rm = T)
median_of_median <- median(labmedians, na.rm = T)
labMScore <- as.vector(abs(scores_na(labmedians, "mad"))) #calculate mscore by lab
labMScore [is.infinite(labMScore )] <- 0 # make infinity zero
labMScore [is.nan(labMScore )] <- 0 # make NA zero
labMScoreIndex <- which(labMScore > MscoreMax) #get the position in the vector that exceeds Mscoremax
return(labMScoreIndex)
}
Mindex <- lapply(df, Mscore) #Get the dataframe and column index for Mscore > 3
我想修改我的函数,以便它创建一个包含数据框名称(来自数据框列表)和列名的数据框。这些代表 Element/compound 和需要存储到 运行 附加统计数据的实验室 ID。
我希望最终输出类似于下面的示例
Analyte Lab_ID
AL2O3 10
AL2O3 12
As 36
哪个returns
$Al2O3
[1] 7 8
$As
[1] 9
$Ba
integer(0)
感谢任何帮助。
您可以在 labMScore > MscoreMax
中提取 names
并使用 stack
将命名列表更改为数据框。
Mscore <- function(x,...){
labmedians <- mapply(median, x[-1], na.rm = T)
median_of_median <- median(labmedians, na.rm = T)
labMScore <- as.vector(abs(scores_na(labmedians, "mad")))
labMScore [is.infinite(labMScore )] <- 0
labMScore [is.nan(labMScore )] <- 0
#Added + 1 because we ignore the 1st column while calculating median
labMScoreIndex <- names(x)[which(labMScore > MscoreMax) + 1]
return(labMScoreIndex)
}
stack(lapply(df, Mscore))[2:1]
# ind values
#1 Al2O3 10
#2 Al2O3 12
#3 As 36