如果匹配，用另一个数据框替换列名

Question

您好，我正在研究如何按列将数据框匹配在一起，然后重命名。如果没有匹配的名称，那么我想删除该列。

例如，我会使用这个主数据集，将其命名为 DF1:

Name	Reference	Good	Fair	Bad	Great	Poor
George	Hill	34	21	33	21	32
Frank	Stairs	29	28	29	30	29
Bertha	Trail	25	25	24	21	26

然后是另一个 DF，称为 DF2，它允许我替换 DF1 的列的名称

Name	Adjusted_Name
Good	good_run
Great	very_great_work
Bad	bad run
Fair	fair run decent

本质上，将被替换的单词不会是任何类型的任何模式，我会尝试匹配 DF2 中的第一列并匹配到 DF1，如果 DF2$Name 和 DF 中存在匹配项（无论哪一列），然后我会用 DF2$Adjusted_Name 的同一行替换该名称。如果没有匹配项，则丢弃 DF1 中的值。

所以最终的目标是要达到：

Name	Reference	good_run	fair run decent	Bad run	very_great_work
George	Hill	34	21	33	21
Frank	Stairs	29	28	29	30
Bertha	Trail	25	25	24	21

在这种情况下，“poor”被删除，因为它与 DF1 的列名称不匹配。

我该怎么办？如果有数千列，我将如何计算？这会改变我的编码方式吗？我对 R 有点陌生，如果有任何提示，我将不胜感激。谢谢！

Answer 1

尝试以下操作 - 使用调整后的名称列表，您可以 grep 所需单词列表与列名称相对应，并在其上子集数据框：

数据

df <- read.table(header = TRUE, text = "Name    Reference   Good    Fair    Bad Great   Poor
                 George Hill    34  21  33  21  32
                 Frank  Stairs  29  28  29  30  29
                 Bertha Trail   25  25  24  21  26")

adj_name <- c("good_run","very_great_run","bad run","fair run decent")

根据所需名称字符串中的 grep 对列进行索引（还要注意列名称上的 tolower()）

desired_words <- paste(unlist(strsplit(adj_name, "_| ")), collapse = "|")

df[,c(1:2,grep(desired_words, tolower(names(df))))]

输出

#    Name Reference Good Fair Bad Great
#1 George      Hill   34   21  33    21
#2  Frank    Stairs   29   28  29    30
#3 Bertha     Trail   25   25  24    21

Answer 2

如果您愿意接受 tidyverse 解决方案，您可以使用

library(dplyr)
library(tibble)

df %>% 
  rename_with(~deframe(df2)[.x], .cols = df2$Name) %>% 
  select(Name, Reference, any_of(df2$Adjusted_Name))

这个returns

# A tibble: 3 x 6
  Name   Reference good_run very_great_work bad_run fair_run_decent
  <chr>  <chr>        <dbl>           <dbl>   <dbl>           <dbl>
1 George Hill            34              21      33              21
2 Frank  Stairs          29              30      29              28
3 Bertha Trail           25              21      24              25

数据

df <- structure(list(Name = c("George", "Frank", "Bertha"), Reference = c("Hill", 
"Stairs", "Trail"), Good = c(34, 29, 25), Fair = c(21, 28, 25
), Bad = c(33, 29, 24), Great = c(21, 30, 21), Poor = c(32, 29, 
26)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-3L), spec = structure(list(cols = list(Name = structure(list(), class = c("collector_character", 
"collector")), Reference = structure(list(), class = c("collector_character", 
"collector")), Good = structure(list(), class = c("collector_double", 
"collector")), Fair = structure(list(), class = c("collector_double", 
"collector")), Bad = structure(list(), class = c("collector_double", 
"collector")), Great = structure(list(), class = c("collector_double", 
"collector")), Poor = structure(list(), class = c("collector_double", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1L), class = "col_spec"))

df2 <- structure(list(Name = c("Good", "Great", "Bad", "Fair"), Adjusted_Name = c("good_run", 
"very_great_work", "bad_run", "fair_run_decent")), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L), spec = structure(list(
    cols = list(Name = structure(list(), class = c("collector_character", 
    "collector")), Adjusted_Name = structure(list(), class = c("collector_character", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1L), class = "col_spec"))

如果匹配，用另一个数据框替换列名

Replacing column names with another data frame if matches

r

multiple-columns

数据