列表中所有数据框元素的最接近值和数据框索引索引
closest value and data frame index index of all data frame elements of a list
我有一个包含数据框的列表:
test <- list()
test[[1]] <- data.frame(C1=c(0.2,0.4,0.5), C2=c(2,3.5,3.7), C3=c(0.3,4,5))
test[[2]] <- data.frame(C1=c(0.1,0.3,0.6), C2=c(3.9,4.3,8), C3=c(3,5.2,10))
test[[3]] <- data.frame(C1=c(0.4,0.55,0.8), C2=c(8.9,10.3,14), C3=c(7,8.4,11))
我想获取此列表中所有数据框行中的哪一列(例如本例中的 C2)具有最接近向量“vec”(下方)中每个元素的值,以及它发生的列表索引(本例中为 1、2 或 3)。
vector <- c(3, 14.4, 7, 0)
想要的答案应该是这样的:
list.index line.number.in.df C1 C2 C3
1 2 0.4 3.5 4
3 3 0.8 14 11
2 3 0.6 8 10
1 1 0.2 2 0.3
我可以设法使用 lapply 为单个值解决 10% 的问题,但是除了获取所有列表元素数据框行之外不能为一堆值(向量)做到这一点找到最接近的值(不仅是所有数据帧中的单行),也无法获得相应的列表索引,即
value <- 3
lapply(test, function(x) x[which.min(abs(value-x$C2)),])
我得到的结果:
[[1]]
C1 C2 C3
2 0.4 3.5 4
[[2]]
C1 C2 C3
1 0.1 3.9 3
[[3]]
C1 C2 C3
1 0.4 8.9 7
有谁能如此友善和耐心地帮助我进一步了解这个问题吗?
提前致谢,新年快乐。
您可以利用 names
的 substrings
。
(w <- sapply(v, \(v)
names(which.min(abs(unlist(setNames(test, seq_along(test))) - v)))))
# [1] "2.C31" "3.C23" "3.C31" "2.C11"
t(mapply(\(x, y) c(list=x, line=y, test[[x]][y, ]),
as.numeric(substr(w, 1, 1)), as.numeric(substring(w, 5)))) |>
as.data.frame()
# list line C1 C2 C3
# 1 2 1 0.1 3.9 3
# 2 3 3 0.8 14 11
# 3 3 1 0.4 8.9 7
# 4 2 1 0.1 3.9 3
注意: R >= 4.1 使用。
数据:
test <- list(structure(list(C1 = c(0.2, 0.4, 0.5), C2 = c(2, 3.5, 3.7
), C3 = c(0.3, 4, 5)), class = "data.frame", row.names = c(NA,
-3L)), structure(list(C1 = c(0.1, 0.3, 0.6), C2 = c(3.9, 4.3,
8), C3 = c(3, 5.2, 10)), class = "data.frame", row.names = c(NA,
-3L)), structure(list(C1 = c(0.4, 0.55, 0.8), C2 = c(8.9, 10.3,
14), C3 = c(7, 8.4, 11)), class = "data.frame", row.names = c(NA,
-3L)))
v <- c(3, 14.4, 7, 0)
希望这就是您要找的。它在每个测试元素的列中找到最接近 vector
.
中的值的值
#install.packages('birk')
library(birk) # required for which.closest()
# find which of the values across the columns C1:C3 in each element of test are closest
# to the values of vector and return the corresponding row numbers
x <- sapply(1:length(vector), \(x) sapply(test, \(i) apply(i, 2, \(j) which.closest(j, vector[x]))))
x <- apply(x, 1, \(x) as.data.frame(table(x)))
x <- lapply(x, \(i) i[which.max(i[, 2]), ])
row_numbers_df <- as.numeric(matrix(do.call(rbind, x)[['x']]))
# extract the values in each of the column C1:C3 corresponding to row_numbers_df
vals <- array(0, dim = length(row_numbers_df))
for (i in 1:length(row_numbers_df)) { vals[i] <- do.call(cbind, test)[row_numbers_df[i], i] }
# how many columns does each data.frame embedded in test have?
unique_number_of_cols <- unique(sapply(test, ncol))
# store results in a data.frame
r <- \(x) round(x, 1)
out <- data.frame(
seq_len(length(test)),
r(rowMeans(matrix(row_numbers_df, ncol = unique_number_of_cols, byrow = TRUE))),
matrix(vals, ncol = unique_number_of_cols, byrow = TRUE)
)
names(out) <- c('list.index', 'line.number.in.df', sapply(test, colnames)[, 1])
结果
> out
list.index line.number.in.df C1 C2 C3
1 1 3.0 0.5 3.7 5
2 2 1.7 0.6 3.9 3
3 3 1.7 0.8 8.9 7
或者,如果您确实希望每个 line.number.in.df
具有唯一的列,那么您可以轻松地将它们作为单独的列存储在 out
.
中
x <- sapply(1:length(vector), \(x) sapply(test, \(i) apply(i, 2, \(j) which.closest(j, vector[x]))))
x <- apply(x, 1, \(x) as.data.frame(table(x)))
x <- lapply(x, \(i) i[which.max(i[, 2]), ])
row_numbers_df <- as.numeric(matrix(do.call(rbind, x)[['x']]))
names(row_numbers_df) <- do.call(c, lapply(test, names))
row_numbers_df
vals <- array(0, dim = length(row_numbers_df))
for (i in 1:length(row_numbers_df)) { vals[i] <- do.call(cbind, test)[row_numbers_df[i], i] }
unique_number_of_cols <- unique(sapply(test, ncol))
out <- data.frame(
seq_len(length(test)),
split(row_numbers_df, names(row_numbers_df)),
matrix(vals, ncol = unique_number_of_cols, byrow = TRUE)
)
column_names <- sapply(test, colnames)[, 1]
names(out) <- c('list.index',
paste0('line.number.in.df.', column_names),
column_names)
结果
> out
list.index line.number.in.df.C1 line.number.in.df.C2 line.number.in.df.C3 C1 C2 C3
1 1 3 3 3 0.5 3.7 5
2 2 3 1 1 0.6 3.9 3
3 3 3 1 1 0.8 8.9 7
这是一个dplyr
方法。我们可以为每个数据帧生成 list.index
和 line.number.in.df
,然后将它们一起生成 bind_rows
。接下来,slice
C2 包含该向量中每个数字的最接近值的行。
library(dplyr)
test <- list(structure(list(C1 = c(0.2, 0.4, 0.5), C2 = c(2, 3.5, 3.7
), C3 = c(0.3, 4, 5)), class = "data.frame", row.names = c(NA,
-3L)), structure(list(C1 = c(0.1, 0.3, 0.6), C2 = c(3.9, 4.3,
8), C3 = c(3, 5.2, 10)), class = "data.frame", row.names = c(NA,
-3L)), structure(list(C1 = c(0.4, 0.55, 0.8), C2 = c(8.9, 10.3,
14), C3 = c(7, 8.4, 11)), class = "data.frame", row.names = c(NA,
-3L)))
vector <- c(3, 14.4, 7, 0)
test %>%
lapply(tibble::rowid_to_column, "line.number.in.df") %>%
bind_rows(.id = "list.index") %>%
slice(vapply(vector, \(x) which.min(abs(x - C2)), integer(1L)))
输出是
list.index line.number.in.df C1 C2 C3
1 1 2 0.4 3.5 4.0
2 3 3 0.8 14.0 11.0
3 2 3 0.6 8.0 10.0
4 1 1 0.2 2.0 0.3
我有一个包含数据框的列表:
test <- list()
test[[1]] <- data.frame(C1=c(0.2,0.4,0.5), C2=c(2,3.5,3.7), C3=c(0.3,4,5))
test[[2]] <- data.frame(C1=c(0.1,0.3,0.6), C2=c(3.9,4.3,8), C3=c(3,5.2,10))
test[[3]] <- data.frame(C1=c(0.4,0.55,0.8), C2=c(8.9,10.3,14), C3=c(7,8.4,11))
我想获取此列表中所有数据框行中的哪一列(例如本例中的 C2)具有最接近向量“vec”(下方)中每个元素的值,以及它发生的列表索引(本例中为 1、2 或 3)。
vector <- c(3, 14.4, 7, 0)
想要的答案应该是这样的:
list.index line.number.in.df C1 C2 C3
1 2 0.4 3.5 4
3 3 0.8 14 11
2 3 0.6 8 10
1 1 0.2 2 0.3
我可以设法使用 lapply 为单个值解决 10% 的问题,但是除了获取所有列表元素数据框行之外不能为一堆值(向量)做到这一点找到最接近的值(不仅是所有数据帧中的单行),也无法获得相应的列表索引,即
value <- 3
lapply(test, function(x) x[which.min(abs(value-x$C2)),])
我得到的结果:
[[1]]
C1 C2 C3
2 0.4 3.5 4
[[2]]
C1 C2 C3
1 0.1 3.9 3
[[3]]
C1 C2 C3
1 0.4 8.9 7
有谁能如此友善和耐心地帮助我进一步了解这个问题吗?
提前致谢,新年快乐。
您可以利用 names
的 substrings
。
(w <- sapply(v, \(v)
names(which.min(abs(unlist(setNames(test, seq_along(test))) - v)))))
# [1] "2.C31" "3.C23" "3.C31" "2.C11"
t(mapply(\(x, y) c(list=x, line=y, test[[x]][y, ]),
as.numeric(substr(w, 1, 1)), as.numeric(substring(w, 5)))) |>
as.data.frame()
# list line C1 C2 C3
# 1 2 1 0.1 3.9 3
# 2 3 3 0.8 14 11
# 3 3 1 0.4 8.9 7
# 4 2 1 0.1 3.9 3
注意: R >= 4.1 使用。
数据:
test <- list(structure(list(C1 = c(0.2, 0.4, 0.5), C2 = c(2, 3.5, 3.7
), C3 = c(0.3, 4, 5)), class = "data.frame", row.names = c(NA,
-3L)), structure(list(C1 = c(0.1, 0.3, 0.6), C2 = c(3.9, 4.3,
8), C3 = c(3, 5.2, 10)), class = "data.frame", row.names = c(NA,
-3L)), structure(list(C1 = c(0.4, 0.55, 0.8), C2 = c(8.9, 10.3,
14), C3 = c(7, 8.4, 11)), class = "data.frame", row.names = c(NA,
-3L)))
v <- c(3, 14.4, 7, 0)
希望这就是您要找的。它在每个测试元素的列中找到最接近 vector
.
#install.packages('birk')
library(birk) # required for which.closest()
# find which of the values across the columns C1:C3 in each element of test are closest
# to the values of vector and return the corresponding row numbers
x <- sapply(1:length(vector), \(x) sapply(test, \(i) apply(i, 2, \(j) which.closest(j, vector[x]))))
x <- apply(x, 1, \(x) as.data.frame(table(x)))
x <- lapply(x, \(i) i[which.max(i[, 2]), ])
row_numbers_df <- as.numeric(matrix(do.call(rbind, x)[['x']]))
# extract the values in each of the column C1:C3 corresponding to row_numbers_df
vals <- array(0, dim = length(row_numbers_df))
for (i in 1:length(row_numbers_df)) { vals[i] <- do.call(cbind, test)[row_numbers_df[i], i] }
# how many columns does each data.frame embedded in test have?
unique_number_of_cols <- unique(sapply(test, ncol))
# store results in a data.frame
r <- \(x) round(x, 1)
out <- data.frame(
seq_len(length(test)),
r(rowMeans(matrix(row_numbers_df, ncol = unique_number_of_cols, byrow = TRUE))),
matrix(vals, ncol = unique_number_of_cols, byrow = TRUE)
)
names(out) <- c('list.index', 'line.number.in.df', sapply(test, colnames)[, 1])
结果
> out
list.index line.number.in.df C1 C2 C3
1 1 3.0 0.5 3.7 5
2 2 1.7 0.6 3.9 3
3 3 1.7 0.8 8.9 7
或者,如果您确实希望每个 line.number.in.df
具有唯一的列,那么您可以轻松地将它们作为单独的列存储在 out
.
x <- sapply(1:length(vector), \(x) sapply(test, \(i) apply(i, 2, \(j) which.closest(j, vector[x]))))
x <- apply(x, 1, \(x) as.data.frame(table(x)))
x <- lapply(x, \(i) i[which.max(i[, 2]), ])
row_numbers_df <- as.numeric(matrix(do.call(rbind, x)[['x']]))
names(row_numbers_df) <- do.call(c, lapply(test, names))
row_numbers_df
vals <- array(0, dim = length(row_numbers_df))
for (i in 1:length(row_numbers_df)) { vals[i] <- do.call(cbind, test)[row_numbers_df[i], i] }
unique_number_of_cols <- unique(sapply(test, ncol))
out <- data.frame(
seq_len(length(test)),
split(row_numbers_df, names(row_numbers_df)),
matrix(vals, ncol = unique_number_of_cols, byrow = TRUE)
)
column_names <- sapply(test, colnames)[, 1]
names(out) <- c('list.index',
paste0('line.number.in.df.', column_names),
column_names)
结果
> out
list.index line.number.in.df.C1 line.number.in.df.C2 line.number.in.df.C3 C1 C2 C3
1 1 3 3 3 0.5 3.7 5
2 2 3 1 1 0.6 3.9 3
3 3 3 1 1 0.8 8.9 7
这是一个dplyr
方法。我们可以为每个数据帧生成 list.index
和 line.number.in.df
,然后将它们一起生成 bind_rows
。接下来,slice
C2 包含该向量中每个数字的最接近值的行。
library(dplyr)
test <- list(structure(list(C1 = c(0.2, 0.4, 0.5), C2 = c(2, 3.5, 3.7
), C3 = c(0.3, 4, 5)), class = "data.frame", row.names = c(NA,
-3L)), structure(list(C1 = c(0.1, 0.3, 0.6), C2 = c(3.9, 4.3,
8), C3 = c(3, 5.2, 10)), class = "data.frame", row.names = c(NA,
-3L)), structure(list(C1 = c(0.4, 0.55, 0.8), C2 = c(8.9, 10.3,
14), C3 = c(7, 8.4, 11)), class = "data.frame", row.names = c(NA,
-3L)))
vector <- c(3, 14.4, 7, 0)
test %>%
lapply(tibble::rowid_to_column, "line.number.in.df") %>%
bind_rows(.id = "list.index") %>%
slice(vapply(vector, \(x) which.min(abs(x - C2)), integer(1L)))
输出是
list.index line.number.in.df C1 C2 C3
1 1 2 0.4 3.5 4.0
2 3 3 0.8 14.0 11.0
3 2 3 0.6 8.0 10.0
4 1 1 0.2 2.0 0.3