如果匹配,用另一个数据框替换列名
Replacing column names with another data frame if matches
您好,我正在研究如何按列将数据框匹配在一起,然后重命名。如果没有匹配的名称,那么我想删除该列。
例如,我会使用这个主数据集,将其命名为 DF1:
Name
Reference
Good
Fair
Bad
Great
Poor
George
Hill
34
21
33
21
32
Frank
Stairs
29
28
29
30
29
Bertha
Trail
25
25
24
21
26
然后是另一个 DF,称为 DF2,它允许我替换 DF1 的列的名称
Name
Adjusted_Name
Good
good_run
Great
very_great_work
Bad
bad run
Fair
fair run decent
本质上,将被替换的单词不会是任何类型的任何模式,我会尝试匹配 DF2 中的第一列并匹配到 DF1,如果 DF2$Name 和 DF 中存在匹配项(无论哪一列),然后我会用 DF2$Adjusted_Name 的同一行替换该名称。如果没有匹配项,则丢弃 DF1 中的值。
所以最终的目标是要达到:
Name
Reference
good_run
fair run decent
Bad run
very_great_work
George
Hill
34
21
33
21
Frank
Stairs
29
28
29
30
Bertha
Trail
25
25
24
21
在这种情况下,“poor”被删除,因为它与 DF1 的列名称不匹配。
我该怎么办?如果有数千列,我将如何计算?这会改变我的编码方式吗?我对 R 有点陌生,如果有任何提示,我将不胜感激。谢谢!
尝试以下操作 - 使用调整后的名称列表,您可以 grep
所需单词列表与列名称相对应,并在其上子集数据框:
数据
df <- read.table(header = TRUE, text = "Name Reference Good Fair Bad Great Poor
George Hill 34 21 33 21 32
Frank Stairs 29 28 29 30 29
Bertha Trail 25 25 24 21 26")
adj_name <- c("good_run","very_great_run","bad run","fair run decent")
根据所需名称字符串中的 grep
对列进行索引(还要注意列名称上的 tolower()
)
desired_words <- paste(unlist(strsplit(adj_name, "_| ")), collapse = "|")
df[,c(1:2,grep(desired_words, tolower(names(df))))]
输出
# Name Reference Good Fair Bad Great
#1 George Hill 34 21 33 21
#2 Frank Stairs 29 28 29 30
#3 Bertha Trail 25 25 24 21
如果您愿意接受 tidyverse
解决方案,您可以使用
library(dplyr)
library(tibble)
df %>%
rename_with(~deframe(df2)[.x], .cols = df2$Name) %>%
select(Name, Reference, any_of(df2$Adjusted_Name))
这个returns
# A tibble: 3 x 6
Name Reference good_run very_great_work bad_run fair_run_decent
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 George Hill 34 21 33 21
2 Frank Stairs 29 30 29 28
3 Bertha Trail 25 21 24 25
数据
df <- structure(list(Name = c("George", "Frank", "Bertha"), Reference = c("Hill",
"Stairs", "Trail"), Good = c(34, 29, 25), Fair = c(21, 28, 25
), Bad = c(33, 29, 24), Great = c(21, 30, 21), Poor = c(32, 29,
26)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA,
-3L), spec = structure(list(cols = list(Name = structure(list(), class = c("collector_character",
"collector")), Reference = structure(list(), class = c("collector_character",
"collector")), Good = structure(list(), class = c("collector_double",
"collector")), Fair = structure(list(), class = c("collector_double",
"collector")), Bad = structure(list(), class = c("collector_double",
"collector")), Great = structure(list(), class = c("collector_double",
"collector")), Poor = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
df2 <- structure(list(Name = c("Good", "Great", "Bad", "Fair"), Adjusted_Name = c("good_run",
"very_great_work", "bad_run", "fair_run_decent")), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L), spec = structure(list(
cols = list(Name = structure(list(), class = c("collector_character",
"collector")), Adjusted_Name = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
您好,我正在研究如何按列将数据框匹配在一起,然后重命名。如果没有匹配的名称,那么我想删除该列。
例如,我会使用这个主数据集,将其命名为 DF1:
Name | Reference | Good | Fair | Bad | Great | Poor |
---|---|---|---|---|---|---|
George | Hill | 34 | 21 | 33 | 21 | 32 |
Frank | Stairs | 29 | 28 | 29 | 30 | 29 |
Bertha | Trail | 25 | 25 | 24 | 21 | 26 |
然后是另一个 DF,称为 DF2,它允许我替换 DF1 的列的名称
Name | Adjusted_Name |
---|---|
Good | good_run |
Great | very_great_work |
Bad | bad run |
Fair | fair run decent |
本质上,将被替换的单词不会是任何类型的任何模式,我会尝试匹配 DF2 中的第一列并匹配到 DF1,如果 DF2$Name 和 DF 中存在匹配项(无论哪一列),然后我会用 DF2$Adjusted_Name 的同一行替换该名称。如果没有匹配项,则丢弃 DF1 中的值。
所以最终的目标是要达到:
Name | Reference | good_run | fair run decent | Bad run | very_great_work |
---|---|---|---|---|---|
George | Hill | 34 | 21 | 33 | 21 |
Frank | Stairs | 29 | 28 | 29 | 30 |
Bertha | Trail | 25 | 25 | 24 | 21 |
在这种情况下,“poor”被删除,因为它与 DF1 的列名称不匹配。
我该怎么办?如果有数千列,我将如何计算?这会改变我的编码方式吗?我对 R 有点陌生,如果有任何提示,我将不胜感激。谢谢!
尝试以下操作 - 使用调整后的名称列表,您可以 grep
所需单词列表与列名称相对应,并在其上子集数据框:
数据
df <- read.table(header = TRUE, text = "Name Reference Good Fair Bad Great Poor
George Hill 34 21 33 21 32
Frank Stairs 29 28 29 30 29
Bertha Trail 25 25 24 21 26")
adj_name <- c("good_run","very_great_run","bad run","fair run decent")
根据所需名称字符串中的 grep
对列进行索引(还要注意列名称上的 tolower()
)
desired_words <- paste(unlist(strsplit(adj_name, "_| ")), collapse = "|")
df[,c(1:2,grep(desired_words, tolower(names(df))))]
输出
# Name Reference Good Fair Bad Great
#1 George Hill 34 21 33 21
#2 Frank Stairs 29 28 29 30
#3 Bertha Trail 25 25 24 21
如果您愿意接受 tidyverse
解决方案,您可以使用
library(dplyr)
library(tibble)
df %>%
rename_with(~deframe(df2)[.x], .cols = df2$Name) %>%
select(Name, Reference, any_of(df2$Adjusted_Name))
这个returns
# A tibble: 3 x 6
Name Reference good_run very_great_work bad_run fair_run_decent
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 George Hill 34 21 33 21
2 Frank Stairs 29 30 29 28
3 Bertha Trail 25 21 24 25
数据
df <- structure(list(Name = c("George", "Frank", "Bertha"), Reference = c("Hill",
"Stairs", "Trail"), Good = c(34, 29, 25), Fair = c(21, 28, 25
), Bad = c(33, 29, 24), Great = c(21, 30, 21), Poor = c(32, 29,
26)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA,
-3L), spec = structure(list(cols = list(Name = structure(list(), class = c("collector_character",
"collector")), Reference = structure(list(), class = c("collector_character",
"collector")), Good = structure(list(), class = c("collector_double",
"collector")), Fair = structure(list(), class = c("collector_double",
"collector")), Bad = structure(list(), class = c("collector_double",
"collector")), Great = structure(list(), class = c("collector_double",
"collector")), Poor = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
df2 <- structure(list(Name = c("Good", "Great", "Bad", "Fair"), Adjusted_Name = c("good_run",
"very_great_work", "bad_run", "fair_run_decent")), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L), spec = structure(list(
cols = list(Name = structure(list(), class = c("collector_character",
"collector")), Adjusted_Name = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))