将唯一标识符列添加到嵌套的数据框列表

Question

示例数据

我正在处理一个嵌套的数据框列表。我的列表包含 1,000 多个列表，每个列表都包含一个数据框作为其唯一元素。每个数据框包含 10 多个变量的 30 多个观察值。为简单起见，这里有一个小示例列表：

df1 <- tibble::tibble(a = 1:25, b = 1:25, c = 1:25, d = 1:25, e = 1:25)
df2 <- tibble::tibble(a = 1:35, b = 1:35, c = 1:35, d = 1:35, e = 1:35)
df3 <- tibble::tibble(a = 1:30, b = 1:30, c = 1:30, d = 1:30, e = 1:30)
df4 <- tibble::tibble(a = 1:20, b = 1:20, c = 1:20, d = 1:20, e = 1:20)

dfs_list <- list(list(a = df1), list(a = df2), list(a = df3), list(a = df4))
  dfs_list

[[1]]
[[1]][[1]]
# A tibble: 25 x 5
       a     b     c     d     e
   <int> <int> <int> <int> <int>
 1     1     1     1     1     1
 2     2     2     2     2     2
 3     3     3     3     3     3
 4     4     4     4     4     4
 5     5     5     5     5     5
 6     6     6     6     6     6
 7     7     7     7     7     7
 8     8     8     8     8     8
 9     9     9     9     9     9
10    10    10    10    10    10
# ... with 15 more rows


[[2]]
[[2]][[1]]
# A tibble: 35 x 5
       a     b     c     d     e
   <int> <int> <int> <int> <int>
 1     1     1     1     1     1
 2     2     2     2     2     2
 3     3     3     3     3     3
 4     4     4     4     4     4
 5     5     5     5     5     5
 6     6     6     6     6     6
 7     7     7     7     7     7
 8     8     8     8     8     8
 9     9     9     9     9     9
10    10    10    10    10    10
# ... with 25 more rows


[[3]]
[[3]][[1]]
# A tibble: 30 x 5
       a     b     c     d     e
   <int> <int> <int> <int> <int>
 1     1     1     1     1     1
 2     2     2     2     2     2
 3     3     3     3     3     3
 4     4     4     4     4     4
 5     5     5     5     5     5
 6     6     6     6     6     6
 7     7     7     7     7     7
 8     8     8     8     8     8
 9     9     9     9     9     9
10    10    10    10    10    10
# ... with 20 more rows


[[4]]
[[4]][[1]]
# A tibble: 20 x 5
       a     b     c     d     e
   <int> <int> <int> <int> <int>
 1     1     1     1     1     1
 2     2     2     2     2     2
 3     3     3     3     3     3
 4     4     4     4     4     4
 5     5     5     5     5     5
 6     6     6     6     6     6
 7     7     7     7     7     7
 8     8     8     8     8     8
 9     9     9     9     9     9
10    10    10    10    10    10
# ... with 10 more rows

期望的输出

我正在尝试为列表中的每个数据框生成一个包含唯一标识符的列。该列将基于两个数字序列；比如，1:10 和 1:100。例如，第一个数据框中的列将包含 1.1，第二个将包含 2.1，依此类推，一直到 10.100.

从头开始处理较小的示例，让我们制作我的数字序列 1:2 和 1:2。下面的 identifier 列是我要添加到列表中每个数据框的内容：

[[1]]
[[1]][[1]]
# A tibble: 25 x 6
       a     b     c     d     e identifier
   <int> <int> <int> <int> <int> <chr>     
 1     1     1     1     1     1 1.1       
 2     2     2     2     2     2 1.1       
 3     3     3     3     3     3 1.1       
 4     4     4     4     4     4 1.1       
 5     5     5     5     5     5 1.1       
 6     6     6     6     6     6 1.1       
 7     7     7     7     7     7 1.1       
 8     8     8     8     8     8 1.1       
 9     9     9     9     9     9 1.1       
10    10    10    10    10    10 1.1       
# ... with 15 more rows


[[2]]
[[2]][[1]]
# A tibble: 35 x 6
       a     b     c     d     e identifier
   <int> <int> <int> <int> <int> <chr>     
 1     1     1     1     1     1 2.1      
 2     2     2     2     2     2 2.1       
 3     3     3     3     3     3 2.1       
 4     4     4     4     4     4 2.1       
 5     5     5     5     5     5 2.1       
 6     6     6     6     6     6 2.1       
 7     7     7     7     7     7 2.1       
 8     8     8     8     8     8 2.1       
 9     9     9     9     9     9 2.1       
10    10    10    10    10    10 2.1       
# ... with 25 more rows


[[3]]
[[3]][[1]]
# A tibble: 30 x 6
       a     b     c     d     e identifier
   <int> <int> <int> <int> <int> <chr>     
 1     1     1     1     1     1 1.2       
 2     2     2     2     2     2 1.2       
 3     3     3     3     3     3 1.2       
 4     4     4     4     4     4 1.2       
 5     5     5     5     5     5 1.2       
 6     6     6     6     6     6 1.2       
 7     7     7     7     7     7 1.2       
 8     8     8     8     8     8 1.2       
 9     9     9     9     9     9 1.2       
10    10    10    10    10    10 1.2       
# ... with 20 more rows


[[4]]
[[4]][[1]]
# A tibble: 20 x 5
       a     b     c     d     e identifier
   <int> <int> <int> <int> <int> <chr>     
 1     1     1     1     1     1 2.2       
 2     2     2     2     2     2 2.2       
 3     3     3     3     3     3 2.2       
 4     4     4     4     4     4 2.2       
 5     5     5     5     5     5 2.2       
 6     6     6     6     6     6 2.2       
 7     7     7     7     7     7 2.2       
 8     8     8     8     8     8 2.2       
 9     9     9     9     9     9 2.2       
10    10    10    10    10    10 2.2       
# ... with 10 more rows

尝试过的方法

我尝试使用 apply(expand.grid()) 创建一个数组，然后使用 mapply() 将数组的一个观察值绑定到每个数据帧：

a.b <- apply(expand.grid(c(1:2), c(1:2)), 1, paste, collapse = '.')
mapply(cbind, dfs_list, "Identifier" = a.b, SIMPLIFY = F)

但是，该列被插入到父列表中，而不是直接插入到数据框中：

[[1]]
               Identifier    
[1,] tbl_df,5 "1.1"

[[2]]
               Identifier    
[1,] tbl_df,5 "2.1"

[[3]]
               Identifier    
[1,] tbl_df,5 "1.2"

[[4]]
               Identifier    
[1,] tbl_df,5 "2.2"

经过反复试验，我在晚上晚些时候尝试了一种稍微不同的方法。起初我以为我已经解决了我的问题，但生成的列表是 13 GB，而不是之前的 19 MB，并且（相对）花费了很多很多时间来编写，所以我怀疑这是否是解决方案。今天早上我也无法使用我的示例数据集重现我的结果。

> dfs_identify <- dfs_list %>% 
+   apply(function(z) mapply(cbind, z, "Identifier" = a.b, SIMPLIFY = F))
Error in match.fun(FUN) : argument "FUN" is missing, with no default

Answer 1

Map(`names<-`, dfs_list, a.b)

这会为每个列表项指定您创建的名称。它没有说“标识符”，但我认为这就是您所追求的。

编辑：

Map(function(x, y) list(cbind(x[[1]], "Identifier" = y)), dfs_list, a.b)

这给出了一个新的标识符列。 x[[1]]是进入嵌套结构内部，list()重新创建了原来的嵌套结构。地图与 mapply(..., simplify = FALSE)

相同

Answer 2

您可以使用 lapply 和 Map -

尝试这种方法

result <- lapply(seq_along(dfs_list), function(x) {
  Map(cbind, dfs_list[[x]], 
             Identifier = paste(x, seq_along(dfs_list[[x]]), sep = '.'))
})

result

#[[1]]
#[[1]][[1]]
#                   mpg cyl disp  hp drat Identifier
#Mazda RX4         21.0   6  160 110 3.90        1.1
#Mazda RX4 Wag     21.0   6  160 110 3.90        1.1
#Datsun 710        22.8   4  108  93 3.85        1.1
#Hornet 4 Drive    21.4   6  258 110 3.08        1.1
#Hornet Sportabout 18.7   8  360 175 3.15        1.1

#[[1]][[2]]
#                   mpg cyl disp  hp drat Identifier
#Mazda RX4 Wag     21.0   6  160 110 3.90        1.2
#Datsun 710        22.8   4  108  93 3.85        1.2
#Hornet 4 Drive    21.4   6  258 110 3.08        1.2
#Hornet Sportabout 18.7   8  360 175 3.15        1.2


#[[2]]
#[[2]][[1]]
#                   mpg cyl disp  hp drat Identifier
#Mazda RX4         21.0   6  160 110 3.90        2.1
#Mazda RX4 Wag     21.0   6  160 110 3.90        2.1
#Datsun 710        22.8   4  108  93 3.85        2.1
#Hornet 4 Drive    21.4   6  258 110 3.08        2.1
#Hornet Sportabout 18.7   8  360 175 3.15        2.1

数据

如果您在 reproducible format

中提供数据，会更容易提供帮助

dfs_list <- list(list(mtcars[1:5, 1:5],mtcars[2:5, 1:5]), list(mtcars[1:5, 1:5]))

将唯一标识符列添加到嵌套的数据框列表

Add unique identifier column to a nested list of data frames

r

uniqueidentifier

nested-lists

dataframe

mapply

示例数据

期望的输出

尝试过的方法