当元素被斜杠分隔时匹配两个表的列

Match two tables's columns when the elements are separated by slash

我有一个如下所示的数据框:

Names Identification
Animals 15/20/25/26
Fruits 1/2/3/4

另一个数据框如下所示:

Id Identification
Cat 15
Dog 20
Elephant 25
Mouse 26
Banana 1
Melon 2
Mango 3
Apple 4

我想匹配第一个 table 到第二个的识别码,以在原来的 table:

中创建一个新列
Names Identification Id
Animals 15/20/25/26 Cat/Dog/Elephant/Mouse
Fruits 1/2/3/4 Banana/Melon/Mango/Apple

这是 table 的代码:


original <- data.frame(
    Names = c('Animals', 'Fruits'),
    Identification = c('15/20/25/26', '1/2/3/4')
)


to_match <- data.frame(
    Id = c('Cat', 'Dog', 'Elephant', 'Mouse','Banana', 'Melon', 'Mango','Apple'),
    Identification = c(15,20,25,26,1,2,3,4)
)

这是我试过的方法,由于斜线的缘故,它不起作用。

original$Id <- to_match$Id[match(original$Identifcation, to_match$Identification),]
    

如有任何帮助,我们将不胜感激。

您可以使用separate_rows然后再次分组:

library(tidyverse)

a <- read.table(text = "Names   Identification
Animals 15/20/25/26
Fruits  1/2/3/4", header = T)

b <- read.table(text = "Id  Identification
Cat 15
Dog 20
Elephant    25
Mouse   26
Banana  1
Melon   2
Mango   3
Apple   4", header = T)

a %>%
  separate_rows(Identification) %>%
  left_join(b %>%
              mutate_all(as.character), by = "Identification") %>%
  group_by(Names) %>%
  summarise(Identification = str_c(Identification, collapse = "/"),
            Id = str_c(Id, collapse =  "/"), 
            .groups = "drop")

# # A tibble: 2 x 3
# Names   Identification Id                      
# <chr>   <chr>          <chr>                   
#   1 Animals 15/20/25/26    Cat/Dog/Elephant/Mouse  
# 2 Fruits  1/2/3/4        Banana/Melon/Mango/Apple

stringi 有一个函数可以让你做到这一点:

original$Id <- stri_replace_all_fixed(original$Identification, to_match$Identification, to_match$Id, vectorize_all = F)