如果匹配,用另一个数据框替换列名

Replacing column names with another data frame if matches

您好,我正在研究如何按列将数据框匹配在一起,然后重命名。如果没有匹配的名称,那么我想删除该列。

例如,我会使用这个主数据集,将其命名为 DF1:

Name Reference Good Fair Bad Great Poor
George Hill 34 21 33 21 32
Frank Stairs 29 28 29 30 29
Bertha Trail 25 25 24 21 26

然后是另一个 DF,称为 DF2,它允许我替换 DF1 的列的名称

Name Adjusted_Name
Good good_run
Great very_great_work
Bad bad run
Fair fair run decent

本质上,将被替换的单词不会是任何类型的任何模式,我会尝试匹配 DF2 中的第一列并匹配到 DF1,如果 DF2$Name 和 DF 中存在匹配项(无论哪一列),然后我会用 DF2$Adjusted_Name 的同一行替换该名称。如果没有匹配项,则丢弃 DF1 中的值。

所以最终的目标是要达到:

Name Reference good_run fair run decent Bad run very_great_work
George Hill 34 21 33 21
Frank Stairs 29 28 29 30
Bertha Trail 25 25 24 21

在这种情况下,“poor”被删除,因为它与 DF1 的列名称不匹配。

我该怎么办?如果有数千列,我将如何计算?这会改变我的编码方式吗?我对 R 有点陌生,如果有任何提示,我将不胜感激。谢谢!

尝试以下操作 - 使用调整后的名称列表,您可以 grep 所需单词列表与列名称相对应,并在其上子集数据框:

数据

df <- read.table(header = TRUE, text = "Name    Reference   Good    Fair    Bad Great   Poor
                 George Hill    34  21  33  21  32
                 Frank  Stairs  29  28  29  30  29
                 Bertha Trail   25  25  24  21  26")

adj_name <- c("good_run","very_great_run","bad run","fair run decent")

根据所需名称字符串中的 grep 对列进行索引(还要注意列名称上的 tolower()

desired_words <- paste(unlist(strsplit(adj_name, "_| ")), collapse = "|")

df[,c(1:2,grep(desired_words, tolower(names(df))))]

输出

#    Name Reference Good Fair Bad Great
#1 George      Hill   34   21  33    21
#2  Frank    Stairs   29   28  29    30
#3 Bertha     Trail   25   25  24    21

如果您愿意接受 tidyverse 解决方案,您可以使用

library(dplyr)
library(tibble)

df %>% 
  rename_with(~deframe(df2)[.x], .cols = df2$Name) %>% 
  select(Name, Reference, any_of(df2$Adjusted_Name))

这个returns

# A tibble: 3 x 6
  Name   Reference good_run very_great_work bad_run fair_run_decent
  <chr>  <chr>        <dbl>           <dbl>   <dbl>           <dbl>
1 George Hill            34              21      33              21
2 Frank  Stairs          29              30      29              28
3 Bertha Trail           25              21      24              25

数据

df <- structure(list(Name = c("George", "Frank", "Bertha"), Reference = c("Hill", 
"Stairs", "Trail"), Good = c(34, 29, 25), Fair = c(21, 28, 25
), Bad = c(33, 29, 24), Great = c(21, 30, 21), Poor = c(32, 29, 
26)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-3L), spec = structure(list(cols = list(Name = structure(list(), class = c("collector_character", 
"collector")), Reference = structure(list(), class = c("collector_character", 
"collector")), Good = structure(list(), class = c("collector_double", 
"collector")), Fair = structure(list(), class = c("collector_double", 
"collector")), Bad = structure(list(), class = c("collector_double", 
"collector")), Great = structure(list(), class = c("collector_double", 
"collector")), Poor = structure(list(), class = c("collector_double", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1L), class = "col_spec"))

df2 <- structure(list(Name = c("Good", "Great", "Bad", "Fair"), Adjusted_Name = c("good_run", 
"very_great_work", "bad_run", "fair_run_decent")), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L), spec = structure(list(
    cols = list(Name = structure(list(), class = c("collector_character", 
    "collector")), Adjusted_Name = structure(list(), class = c("collector_character", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1L), class = "col_spec"))