根据 data.frame 中存储的单独字符向量,有条件地重命名列表中的列
Conditionally rename columns in list, based on separate character vector stored in data.frame
我有一个名为 lst
的 tibbles
列表:
> lst
[[1]]
# A tibble: 2 x 4
temp1 temp2 temp3 id
<chr> <dbl> <dbl> <dbl>
1 Metric 1 150 1234 201
2 Metric 2 190 3456 201
[[2]]
# A tibble: 2 x 4
temp1 temp2 temp3 id
<chr> <dbl> <dbl> <dbl>
1 Metric 1 190 1231 202
2 Metric 2 120 3356 202
我还有一个名为 df
的单独 tibble
,其中有一列包含字符向量以重命名 lst
中的列:
# A tibble: 2 x 2
colnames id
<chr> <dbl>
1 c(' ','Ranking 1 for School A', 'Ranking 2 for School A') 201
2 c(' ', 'Ranking 1 for School B', 'Ranking 2 for School B') 202
我正在寻找一种方法,最好使用 purrr
中的某种形式的 map
来删除 id
列并为每个 tibble
重命名列在 lst
中,基于 df
.
中的值
非常感谢任何建议。提前谢谢你。
期望的输出:
[[1]]
# A tibble: 2 x 3
` ` `Ranking 1 for School A` `Ranking 2 for School A`
<chr> <dbl> <dbl>
1 Metric 1 150 1234
2 Metric 2 190 3456
[[2]]
# A tibble: 2 x 3
` ` `Ranking 1 for School B` `Ranking 2 for School B`
<chr> <dbl> <dbl>
1 Metric 1 190 1231
2 Metric 2 120 3356
数据:
lst <- list(structure(list(temp1 = c("Metric 1", "Metric 2"), temp2 = c(150,
190), temp3 = c(1234, 3456), id = c(201, 201)), row.names = c(NA,
-2L), class = c("tbl_df", "tbl", "data.frame")), structure(list(
temp1 = c("Metric 1", "Metric 2"), temp2 = c(190, 120), temp3 = c(1231,
3356), id = c(202, 202)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")))
df <- structure(list(colnames = c("c(' ','Ranking 1 for School A', 'Ranking 2 for School A')",
"c(' ', 'Ranking 1 for School B', 'Ranking 2 for School B')"),
id = c(201, 202)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"))
library(tidyverse)
map2(lst, pmap(df, ~.), ~ set_names(.x[-4], eval(parse(text = .y))))
#> [[1]]
#> # A tibble: 2 x 3
#> ` ` `Ranking 1 for School A` `Ranking 2 for School A`
#> <chr> <dbl> <dbl>
#> 1 Metric 1 150 1234
#> 2 Metric 2 190 3456
#>
#> [[2]]
#> # A tibble: 2 x 3
#> ` ` `Ranking 1 for School B` `Ranking 2 for School B`
#> <chr> <dbl> <dbl>
#> 1 Metric 1 190 1231
#> 2 Metric 2 120 3356
由 reprex package (v2.0.0)
于 2021-07-01 创建
从表面上看,这就是我会做的,
library(tidyverse)
1:length(lst) %>% map(
.f = function(x) {
# Store list
tmp <- lst[[x]] %>%
select(-"id")
# Rename Colums
colnames(tmp) <- paste((df[x,"colnames"])) %>%
parse(text = .) %>%
eval()
# Return the modified data
tmp
}
)
注:
显然,这假设 lst
和 colnames
是顺序存储的,因此 list
中的 index 1
使用 df[,"colnames"]
中的 index 1
。
我看到您更喜欢 tidyverse 答案并且已经有了至少一个好的答案。所以我想我会分享一个非 tidyverse 方法,以防后来出现的任何人感兴趣...
library(qdapRegex)
for(i in 1:length(lst)){
# extract field names based on 'id'
new_names <- qdapRegex::rm_between(df[df$id == lst[[i]]$id,"colnames"], "'", "'", extract = TRUE)
# rename fields
names(lst[[i]]) <- new_names[[1]]
# drop NA field
lst[[i]] <- lst[[i]][!is.na(names(lst[[i]]))]
}
您也可以使用以下解决方案。首先我们将第二个数据框中的colname
变量分开:
library(dplyr)
library(purrr)
df %>%
mutate(colnames = gsub("[c()]", "", colnames)) %>%
separate(colnames, into = paste("col", 1:3, sep = "_"), sep = ",\s?") -> DF
DF
# A tibble: 2 x 4
col_1 col_2 col_3 id
<chr> <chr> <chr> <dbl>
1 ' ' 'Ranking 1 for Shool A' 'Ranking 2 for Shool A' 201
2 ' ' 'Ranking 1 for Shool B' 'Ranking 2 for Shool B' 202
然后我们用它来更改列表元素中的旧列名:
lst %>%
map(~ .x %>%
set_names(DF %>% filter(id == .x$id) %>% unlist()) %>%
select(-length(.)))
[[1]]
# A tibble: 2 x 3
`' '` `'Ranking 1 for Shool A'` `'Ranking 2 for Shool A'`
<chr> <dbl> <dbl>
1 Metric 1 150 1234
2 Metric 2 190 3456
[[2]]
# A tibble: 2 x 3
`' '` `'Ranking 1 for Shool B'` `'Ranking 2 for Shool B'`
<chr> <dbl> <dbl>
1 Metric 1 190 1231
2 Metric 2 120 3356
我有一个名为 lst
的 tibbles
列表:
> lst
[[1]]
# A tibble: 2 x 4
temp1 temp2 temp3 id
<chr> <dbl> <dbl> <dbl>
1 Metric 1 150 1234 201
2 Metric 2 190 3456 201
[[2]]
# A tibble: 2 x 4
temp1 temp2 temp3 id
<chr> <dbl> <dbl> <dbl>
1 Metric 1 190 1231 202
2 Metric 2 120 3356 202
我还有一个名为 df
的单独 tibble
,其中有一列包含字符向量以重命名 lst
中的列:
# A tibble: 2 x 2
colnames id
<chr> <dbl>
1 c(' ','Ranking 1 for School A', 'Ranking 2 for School A') 201
2 c(' ', 'Ranking 1 for School B', 'Ranking 2 for School B') 202
我正在寻找一种方法,最好使用 purrr
中的某种形式的 map
来删除 id
列并为每个 tibble
重命名列在 lst
中,基于 df
.
非常感谢任何建议。提前谢谢你。
期望的输出:
[[1]]
# A tibble: 2 x 3
` ` `Ranking 1 for School A` `Ranking 2 for School A`
<chr> <dbl> <dbl>
1 Metric 1 150 1234
2 Metric 2 190 3456
[[2]]
# A tibble: 2 x 3
` ` `Ranking 1 for School B` `Ranking 2 for School B`
<chr> <dbl> <dbl>
1 Metric 1 190 1231
2 Metric 2 120 3356
数据:
lst <- list(structure(list(temp1 = c("Metric 1", "Metric 2"), temp2 = c(150,
190), temp3 = c(1234, 3456), id = c(201, 201)), row.names = c(NA,
-2L), class = c("tbl_df", "tbl", "data.frame")), structure(list(
temp1 = c("Metric 1", "Metric 2"), temp2 = c(190, 120), temp3 = c(1231,
3356), id = c(202, 202)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")))
df <- structure(list(colnames = c("c(' ','Ranking 1 for School A', 'Ranking 2 for School A')",
"c(' ', 'Ranking 1 for School B', 'Ranking 2 for School B')"),
id = c(201, 202)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"))
library(tidyverse)
map2(lst, pmap(df, ~.), ~ set_names(.x[-4], eval(parse(text = .y))))
#> [[1]]
#> # A tibble: 2 x 3
#> ` ` `Ranking 1 for School A` `Ranking 2 for School A`
#> <chr> <dbl> <dbl>
#> 1 Metric 1 150 1234
#> 2 Metric 2 190 3456
#>
#> [[2]]
#> # A tibble: 2 x 3
#> ` ` `Ranking 1 for School B` `Ranking 2 for School B`
#> <chr> <dbl> <dbl>
#> 1 Metric 1 190 1231
#> 2 Metric 2 120 3356
由 reprex package (v2.0.0)
于 2021-07-01 创建从表面上看,这就是我会做的,
library(tidyverse)
1:length(lst) %>% map(
.f = function(x) {
# Store list
tmp <- lst[[x]] %>%
select(-"id")
# Rename Colums
colnames(tmp) <- paste((df[x,"colnames"])) %>%
parse(text = .) %>%
eval()
# Return the modified data
tmp
}
)
注:
显然,这假设 lst
和 colnames
是顺序存储的,因此 list
中的 index 1
使用 df[,"colnames"]
中的 index 1
。
我看到您更喜欢 tidyverse 答案并且已经有了至少一个好的答案。所以我想我会分享一个非 tidyverse 方法,以防后来出现的任何人感兴趣...
library(qdapRegex)
for(i in 1:length(lst)){
# extract field names based on 'id'
new_names <- qdapRegex::rm_between(df[df$id == lst[[i]]$id,"colnames"], "'", "'", extract = TRUE)
# rename fields
names(lst[[i]]) <- new_names[[1]]
# drop NA field
lst[[i]] <- lst[[i]][!is.na(names(lst[[i]]))]
}
您也可以使用以下解决方案。首先我们将第二个数据框中的colname
变量分开:
library(dplyr)
library(purrr)
df %>%
mutate(colnames = gsub("[c()]", "", colnames)) %>%
separate(colnames, into = paste("col", 1:3, sep = "_"), sep = ",\s?") -> DF
DF
# A tibble: 2 x 4
col_1 col_2 col_3 id
<chr> <chr> <chr> <dbl>
1 ' ' 'Ranking 1 for Shool A' 'Ranking 2 for Shool A' 201
2 ' ' 'Ranking 1 for Shool B' 'Ranking 2 for Shool B' 202
然后我们用它来更改列表元素中的旧列名:
lst %>%
map(~ .x %>%
set_names(DF %>% filter(id == .x$id) %>% unlist()) %>%
select(-length(.)))
[[1]]
# A tibble: 2 x 3
`' '` `'Ranking 1 for Shool A'` `'Ranking 2 for Shool A'`
<chr> <dbl> <dbl>
1 Metric 1 150 1234
2 Metric 2 190 3456
[[2]]
# A tibble: 2 x 3
`' '` `'Ranking 1 for Shool B'` `'Ranking 2 for Shool B'`
<chr> <dbl> <dbl>
1 Metric 1 190 1231
2 Metric 2 120 3356