根据 data.frame 中存储的单独字符向量,有条件地重命名列表中的列

Conditionally rename columns in list, based on separate character vector stored in data.frame

我有一个名为 lsttibbles 列表:

> lst
[[1]]
# A tibble: 2 x 4
  temp1    temp2 temp3    id
  <chr>    <dbl> <dbl> <dbl>
1 Metric 1   150  1234   201
2 Metric 2   190  3456   201

[[2]]
# A tibble: 2 x 4
  temp1    temp2 temp3    id
  <chr>    <dbl> <dbl> <dbl>
1 Metric 1   190  1231   202
2 Metric 2   120  3356   202

我还有一个名为 df 的单独 tibble,其中有一列包含字符向量以重命名 lst 中的列:

# A tibble: 2 x 2
  colnames                                                      id
  <chr>                                                      <dbl>
1 c(' ','Ranking 1 for School A', 'Ranking 2 for School A')    201
2 c(' ', 'Ranking 1 for School B', 'Ranking 2 for School B')   202

我正在寻找一种方法,最好使用 purrr 中的某种形式的 map 来删除 id 列并为每个 tibble 重命名列在 lst 中,基于 df.

中的值

非常感谢任何建议。提前谢谢你。

期望的输出:

[[1]]
# A tibble: 2 x 3
  ` `      `Ranking 1 for School A` `Ranking 2 for School A`
  <chr>                       <dbl>                    <dbl>
1 Metric 1                      150                     1234
2 Metric 2                      190                     3456

[[2]]
# A tibble: 2 x 3
  ` `      `Ranking 1 for School B` `Ranking 2 for School B`
  <chr>                       <dbl>                    <dbl>
1 Metric 1                      190                     1231
2 Metric 2                      120                     3356

数据:

lst <- list(structure(list(temp1 = c("Metric 1", "Metric 2"), temp2 = c(150, 
190), temp3 = c(1234, 3456), id = c(201, 201)), row.names = c(NA, 
-2L), class = c("tbl_df", "tbl", "data.frame")), structure(list(
    temp1 = c("Metric 1", "Metric 2"), temp2 = c(190, 120), temp3 = c(1231, 
    3356), id = c(202, 202)), row.names = c(NA, -2L), class = c("tbl_df", 
"tbl", "data.frame")))

df <- structure(list(colnames = c("c(' ','Ranking 1 for School A', 'Ranking 2 for School A')", 
"c(' ', 'Ranking 1 for School B', 'Ranking 2 for School B')"), 
    id = c(201, 202)), row.names = c(NA, -2L), class = c("tbl_df", 
"tbl", "data.frame"))
library(tidyverse)

map2(lst, pmap(df, ~.), ~ set_names(.x[-4], eval(parse(text = .y))))
#> [[1]]
#> # A tibble: 2 x 3
#>   ` `      `Ranking 1 for School A` `Ranking 2 for School A`
#>   <chr>                       <dbl>                    <dbl>
#> 1 Metric 1                      150                     1234
#> 2 Metric 2                      190                     3456
#> 
#> [[2]]
#> # A tibble: 2 x 3
#>   ` `      `Ranking 1 for School B` `Ranking 2 for School B`
#>   <chr>                       <dbl>                    <dbl>
#> 1 Metric 1                      190                     1231
#> 2 Metric 2                      120                     3356

reprex package (v2.0.0)

于 2021-07-01 创建

从表面上看,这就是我会做的,

library(tidyverse)

1:length(lst) %>% map(
        .f = function(x) {
                
                # Store list
                tmp <- lst[[x]] %>% 
                        select(-"id")
                
                
                # Rename Colums
                colnames(tmp) <- paste((df[x,"colnames"])) %>%
                                    parse(text = .) %>% 
                                       eval()
                
                # Return the modified data 
                tmp
                
        }
)

注: 显然,这假设 lstcolnames 是顺序存储的,因此 list 中的 index 1 使用 df[,"colnames"] 中的 index 1

我看到您更喜欢 tidyverse 答案并且已经有了至少一个好的答案。所以我想我会分享一个非 tidyverse 方法,以防后来出现的任何人感兴趣...

library(qdapRegex)
for(i in 1:length(lst)){
  # extract field names based on 'id'
  new_names <- qdapRegex::rm_between(df[df$id == lst[[i]]$id,"colnames"], "'", "'", extract = TRUE)
  # rename fields
  names(lst[[i]]) <- new_names[[1]]
  # drop NA field
  lst[[i]] <- lst[[i]][!is.na(names(lst[[i]]))]
}

您也可以使用以下解决方案。首先我们将第二个数据框中的colname变量分开:

library(dplyr)
library(purrr)

df %>%
  mutate(colnames = gsub("[c()]", "", colnames)) %>%
  separate(colnames, into = paste("col", 1:3, sep = "_"), sep = ",\s?") -> DF

DF
# A tibble: 2 x 4
  col_1 col_2                   col_3                      id
  <chr> <chr>                   <chr>                   <dbl>
1 ' '   'Ranking 1 for Shool A' 'Ranking 2 for Shool A'   201
2 ' '   'Ranking 1 for Shool B' 'Ranking 2 for Shool B'   202

然后我们用它来更改列表元素中的旧列名:

lst %>%
  map(~ .x %>% 
        set_names(DF %>% filter(id == .x$id) %>% unlist()) %>%
        select(-length(.)))

[[1]]
# A tibble: 2 x 3
  `' '`    `'Ranking 1 for Shool A'` `'Ranking 2 for Shool A'`
  <chr>                        <dbl>                     <dbl>
1 Metric 1                       150                      1234
2 Metric 2                       190                      3456

[[2]]
# A tibble: 2 x 3
  `' '`    `'Ranking 1 for Shool B'` `'Ranking 2 for Shool B'`
  <chr>                        <dbl>                     <dbl>
1 Metric 1                       190                      1231
2 Metric 2                       120                      3356