当元素被斜杠分隔时匹配两个表的列
Match two tables's columns when the elements are separated by slash
我有一个如下所示的数据框:
Names
Identification
Animals
15/20/25/26
Fruits
1/2/3/4
另一个数据框如下所示:
Id
Identification
Cat
15
Dog
20
Elephant
25
Mouse
26
Banana
1
Melon
2
Mango
3
Apple
4
我想匹配第一个 table 到第二个的识别码,以在原来的 table:
中创建一个新列
Names
Identification
Id
Animals
15/20/25/26
Cat/Dog/Elephant/Mouse
Fruits
1/2/3/4
Banana/Melon/Mango/Apple
这是 table 的代码:
original <- data.frame(
Names = c('Animals', 'Fruits'),
Identification = c('15/20/25/26', '1/2/3/4')
)
to_match <- data.frame(
Id = c('Cat', 'Dog', 'Elephant', 'Mouse','Banana', 'Melon', 'Mango','Apple'),
Identification = c(15,20,25,26,1,2,3,4)
)
这是我试过的方法,由于斜线的缘故,它不起作用。
original$Id <- to_match$Id[match(original$Identifcation, to_match$Identification),]
如有任何帮助,我们将不胜感激。
您可以使用separate_rows
然后再次分组:
library(tidyverse)
a <- read.table(text = "Names Identification
Animals 15/20/25/26
Fruits 1/2/3/4", header = T)
b <- read.table(text = "Id Identification
Cat 15
Dog 20
Elephant 25
Mouse 26
Banana 1
Melon 2
Mango 3
Apple 4", header = T)
a %>%
separate_rows(Identification) %>%
left_join(b %>%
mutate_all(as.character), by = "Identification") %>%
group_by(Names) %>%
summarise(Identification = str_c(Identification, collapse = "/"),
Id = str_c(Id, collapse = "/"),
.groups = "drop")
# # A tibble: 2 x 3
# Names Identification Id
# <chr> <chr> <chr>
# 1 Animals 15/20/25/26 Cat/Dog/Elephant/Mouse
# 2 Fruits 1/2/3/4 Banana/Melon/Mango/Apple
stringi
有一个函数可以让你做到这一点:
original$Id <- stri_replace_all_fixed(original$Identification, to_match$Identification, to_match$Id, vectorize_all = F)
我有一个如下所示的数据框:
Names | Identification |
---|---|
Animals | 15/20/25/26 |
Fruits | 1/2/3/4 |
另一个数据框如下所示:
Id | Identification |
---|---|
Cat | 15 |
Dog | 20 |
Elephant | 25 |
Mouse | 26 |
Banana | 1 |
Melon | 2 |
Mango | 3 |
Apple | 4 |
我想匹配第一个 table 到第二个的识别码,以在原来的 table:
中创建一个新列Names | Identification | Id |
---|---|---|
Animals | 15/20/25/26 | Cat/Dog/Elephant/Mouse |
Fruits | 1/2/3/4 | Banana/Melon/Mango/Apple |
这是 table 的代码:
original <- data.frame(
Names = c('Animals', 'Fruits'),
Identification = c('15/20/25/26', '1/2/3/4')
)
to_match <- data.frame(
Id = c('Cat', 'Dog', 'Elephant', 'Mouse','Banana', 'Melon', 'Mango','Apple'),
Identification = c(15,20,25,26,1,2,3,4)
)
这是我试过的方法,由于斜线的缘故,它不起作用。
original$Id <- to_match$Id[match(original$Identifcation, to_match$Identification),]
如有任何帮助,我们将不胜感激。
您可以使用separate_rows
然后再次分组:
library(tidyverse)
a <- read.table(text = "Names Identification
Animals 15/20/25/26
Fruits 1/2/3/4", header = T)
b <- read.table(text = "Id Identification
Cat 15
Dog 20
Elephant 25
Mouse 26
Banana 1
Melon 2
Mango 3
Apple 4", header = T)
a %>%
separate_rows(Identification) %>%
left_join(b %>%
mutate_all(as.character), by = "Identification") %>%
group_by(Names) %>%
summarise(Identification = str_c(Identification, collapse = "/"),
Id = str_c(Id, collapse = "/"),
.groups = "drop")
# # A tibble: 2 x 3
# Names Identification Id
# <chr> <chr> <chr>
# 1 Animals 15/20/25/26 Cat/Dog/Elephant/Mouse
# 2 Fruits 1/2/3/4 Banana/Melon/Mango/Apple
stringi
有一个函数可以让你做到这一点:
original$Id <- stri_replace_all_fixed(original$Identification, to_match$Identification, to_match$Id, vectorize_all = F)