当条件满足 R 时从数据帧列表中提取数据帧名称和列名称

Question

我有一个数据帧列表，下面是一小段摘录

df <- list(Al2O3 = structure(list(Determination_No = c(1, 2, 3, 4, 
5, 6, 7, 8, 9, 10), `2` = c(2.04, 2.07, 2.05, 2.07, 2.1, 2.08, 
NA, NA, NA, NA), `3` = c(2.08, 2.1, 2.08, 2.13, 2.1, 2.08, NA, 
NA, NA, NA), `4` = c(2.08, 2.08, 2.09, 2.06, 2.08, 2.07, 2.07, 
2.06, 2.08, 2.08), `5` = c(2.11, 2.09, 2.1, 2.08, 2.09, 2.09, 
NA, NA, NA, NA), `7` = c(2.06, 2.05, 2.04, 2.05, 2.04, 2.03, 
NA, NA, NA, NA), `8` = c(2.078, 2.065, 2.057, 2.063, 2.067, 2.066, 
NA, NA, NA, NA), `10` = c(2.191776681, 2.153987428, 2.153987428, 
2.097303548, 2.116198175, 2.116198175, NA, NA, NA, NA), `12` = c(2.24, 
2.08, 2.12, 2.15, 2.15, 2.15, NA, NA, NA, NA), `36` = c(2.07, 
2.082, 2.048, 2.046, 2.086, 2.069, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-10L)), As = structure(list(Determination_No = c(1, 2, 3, 4, 
5, 6, 7, 8, 9, 10), `2` = c(0.002, 0.001, 0.001, 0.001, 0.002, 
0.001, NA, NA, NA, NA), `3` = c(0.003, 0.002, 0.002, 0.002, 0.001, 
0.002, NA, NA, NA, NA), `4` = c(0.001, 0.002, 0.001, 0.002, 0.002, 
0.002, 0.001, 0.002, 0.002, 0.003), `5` = c(0.002, 0.001, 0.001, 
0.001, 0.001, 0.002, NA, NA, NA, NA), `7` = c(NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_), `8` = c(NA, 0.001, NA, NA, NA, NA, NA, NA, NA, NA), 
    `10` = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), `12` = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_), `36` = c(0.0053, 0.0053, 0.0053, 
    0.00454, 0.0053, 0.0053, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-10L)), Ba = structure(list(Determination_No = c(1, 2, 3, 4, 
5, 6, 7, 8, 9, 10), `2` = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), 
    `3` = c(NA, NA, NA, NA, 0.001, NA, NA, NA, NA, NA), `4` = c(0.004, 
    0.003, 0.003, 0.004, 0.003, 0.002, 0.004, 0.002, 0.005, NA
    ), `5` = c(NA, NA, NA, NA, NA, 0.003, NA, NA, NA, NA), `7` = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_), `8` = c(0.002, 0.003, NA, 
    NA, NA, 0.002, NA, NA, NA, NA), `10` = c(NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_), `12` = c(NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_), `36` = c(0.00089566, 0.00089566, 0.00089566, 0.00089566, 
    0.00089566, 0.00089566, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-10L)))

我有以下函数可以计算 M 分数（修改后的 zscore）。对于数据帧列表中每个数据帧中的每一列，returns 列的每个数据帧中的位置。


library(outliers)
scores_na <- function(x, ...) {
  not_na <- !is.na(x)
  scores <- rep(NA, length(x))
  scores[not_na] <- outliers::scores(na.omit(x), ...)
  scores
}


MscoreMax <- 3

Mscore <- function(x,...){
  labmedians <- mapply(median, x[-1], na.rm = T)
  median_of_median <- median(labmedians, na.rm = T)
  labMScore <- as.vector(abs(scores_na(labmedians, "mad"))) #calculate mscore by lab
  labMScore [is.infinite(labMScore )] <- 0 # make infinity zero
  labMScore [is.nan(labMScore )] <- 0 # make NA zero
  labMScoreIndex <- which(labMScore > MscoreMax) #get the position in the vector that exceeds Mscoremax
  
  return(labMScoreIndex) 
}

Mindex <- lapply(df, Mscore) #Get the dataframe and column index for Mscore > 3

我想修改我的函数，以便它创建一个包含数据框名称（来自数据框列表）和列名的数据框。这些代表 Element/compound 和需要存储到运行附加统计数据的实验室 ID。

我希望最终输出类似于下面的示例

Analyte Lab_ID
AL2O3     10
AL2O3     12
As        36

哪个returns

$Al2O3
[1] 7 8

$As
[1] 9

$Ba
integer(0)

感谢任何帮助。

Answer 1

您可以在 labMScore > MscoreMax 中提取 names 并使用 stack 将命名列表更改为数据框。

Mscore <- function(x,...){
  labmedians <- mapply(median, x[-1], na.rm = T)
  median_of_median <- median(labmedians, na.rm = T)
  labMScore <- as.vector(abs(scores_na(labmedians, "mad")))
  labMScore [is.infinite(labMScore )] <- 0 
  labMScore [is.nan(labMScore )] <- 0 
  #Added + 1 because we ignore the 1st column while calculating median
  labMScoreIndex <- names(x)[which(labMScore > MscoreMax) + 1]
  return(labMScoreIndex) 
}

stack(lapply(df, Mscore))[2:1]

#    ind values
#1 Al2O3     10
#2 Al2O3     12
#3    As     36

当条件满足 R 时从数据帧列表中提取数据帧名称和列名称

Extract dataframe name and column name from a list of dataframes when condition met R

r

lapply

dataframe