比较 data.frames 的两个列表中的 colnames？

Question

我正在使用一个函数，当这些变量不符合特定条件时，它会从列表中的每个 data.frame 中删除各个列，我想要一种方便的方法来查看哪些列已被删除。现实世界中的 data.frame 将有 1000 个不同的 colnames，这些名称有一些重叠。

在这个简化的例子中，我想得到一个列表，显示 list1 中存在但 list2 中不存在的每个 data.frame 的变量。

输入列表

> list1
$A
  a b
1 2 3
$B
  c d e
1 9 8 1
$C
  f g
1 6 7

> list2
$A
  a
1 2
$B
  c d
1 9 8
$C
  g
1 7

期望输出

我想保留列表结构，以便我看到从每个 data.frame.

中删除了哪些列

$A
  b
1 3
$B
  e
1 1
$C
  f
1 6

我的尝试

我看过 SO，但只找到了与比较 data.frame 相关的解决方案。请记住，列表元素的名称（此处为 A、B 和 C）在列表中始终相同。我的想法是将 setdiff 或 setdiff 与 mapply 一起使用，但我的修修补补并没有取得成果。可以做什么？

## sample data
list1 <- list(A=data.frame(a=2, b=3), B=data.frame(c=9,d=8,e=1), C=data.frame(f= 6,g=7))
list2 <- list(A=data.frame(a=2), B=data.frame(c=9,d=8), C=data.frame(g=7))
desired_output <- list(A=data.frame(b=3), B=data.frame(e=1), C=data.frame(f= 6))

## attempts

# gives List 1
setdiff(list1, list2)

# gives 'Error: not compatible: Cols in x but not y: `b`.'
mapply(setdiff, x = list1, y = list2)

# gives 'Error in list1[[i]] : recursive indexing failed at level 3'
mapply(setdiff, x = colnames(list1[[i]]), y = colnames(list2[[i]]))

# gives 'list()'
mapply(setdiff, x = colnames(list1[i]), y = colnames(list2[i]))

# Gives 'Error in list1[colnams] : invalid subscript type 'list''
colnams <- list()
for(i in seq_along(list1)){
   colnams[i] <- !colnames(list1[[i]]) %in% colnames(list2[[i]]) 
}
list1[colnams]

Answer 1

您可以根据另一个列中的列将一个函数应用到子集 a data.frame，并确保它始终 return a data.frame 使用 drop = F。并确保在 mapply 中使用 SIMPLIFY = F，因此它总是 return 列表结构。

mapply(function(x,y) x[,-which(names(x) %in% names(y)), drop = F], list1, list2, SIMPLIFY = F)
#> $A
#>   b
#> 1 3
#> 
#> $B
#>   e
#> 1 1
#> 
#> $C
#>   f
#> 1 6

Answer 2

您可以使用 lapply 提取名称，使用 setdiff 提取不在其他列表中的名称。 不需要列表排序。

x <- lapply(list1, names)
y <- lapply(list2, names)
lapply(setNames(names(x), names(x)), function(i) list1[[i]][setdiff(x[[i]], y[[i]])])
#$A
#  b
#1 3
#
#$B
#  e
#1 1
#
#$C
#  f
#1 6

Answer 3

与purrr:

map2(.x = list1,
     .y = list2,
     ~ .x[setdiff(names(.x), names(.y))])

$A
  b
1 3

$B
  e
1 1

$C
  f
1 6

比较 data.frames 的两个列表中的 colnames？

Compare the colnames within two lists of data.frames?

r

list

names