添加一个新列，将一个字符串映射到基于 "Rosetta Stone" 数据框的新字符串？

Question

我在 R 中有一个数据框。

我正在尝试 add/mutate 一个新列，该列使用 map/translation/Rosetta Stone 数据框将几个旧字符串映射到新字符串，该数据框定义了我要替换的字符串。

我在想一些涉及 dplyr::mutate 的东西和某种适用于 gsub 的函数，但我无法将它们放在一起。

起始数据帧：

  starting_df <- read.table(header=TRUE, text="
  ID   Genotype
  VIT_123_1    0
  ROM_456_2    0
  VIT_78_1     1
  BELG_910_1   1
")

Rosetta Stone 数据框：

  map_df <- read.table(header=TRUE, text="
  ID   New_ID
  VIT   VCO1
  ROM   VRO1
  BELG  VBE2
")

所需的输出数据帧：

  >head(updated_df)
    ID           Genotype    New_ID
    VIT_123_1    0           VCO1_123_1
    ROM_456_2    0           VRO1_456_2
    VIT_78_1     1           VCO1_78_1
    BELG_910_1   1           VBE2_910_1

Answer 1

您可以使用 stringr 包中的 str_replace_all。

首先将您的 map_df 数据框转换为命名向量：

map_v = as.character(map_df$New_ID)
names(map_v) = map_df$ID

然后用新值替换旧值：

library(stringr)
res = starting_df
res$New_ID = str_replace_all(starting_df$ID,map_v)

          ID Genotype     New_ID
1  VIT_123_1        0 VCO1_123_1
2  ROM_456_2        0 VRO1_456_2
3   VIT_78_1        1  VCO1_78_1
4 BELG_910_1        1 VBE2_910_1

Answer 2

您可以使用 match 函数执行此操作，而无需使用 stringr

updated_df <- starting_df # this is simply because your question specifies a new dataframe
updated_df$New_ID <- map_df$New_ID[match(updated_df$ID, map_df$ID)]

添加一个新列，将一个字符串映射到基于 "Rosetta Stone" 数据框的新字符串？

Add a new column that maps one character string onto a new character string based on a "Rosetta Stone" data frame?

r

gsub

dplyr