R: "vlookup" 基于 R 中的部分字符串匹配
R: "vlookup" based on partial string matches in R
我有两个数据框:
1,
NAME
1 SMALL H
2 ZITT M
3 SMITH E
4 GLANZEL W
5 HUANG MH
6 THIJS B
和 2,
name address
SIBLEY B SOME ADDRESS 1
STEWART C;KOCH A SOME ADDRESS 2
HILL GM;LEE A;SMITH E SOME ADDRESS 3
DAVIS L SOME ADDRESS 4
MERCIER K;SMITH E;GIBBONE A SOME ADDRESS 5
DAVIDSON S;BEKIARI A SOME ADDRESS 6
我希望能够将第一个 table 中的 NAME
与第二个 table 中的 name
字符串匹配的实例进行匹配,然后添加来自 ADDRESS
列的数据,有点像 vlookup。它还必须处理同名的多个实例。在上面的示例中,名称 SMITH E
(不同的人)将提供匹配项,给出以下结果:
NAME ADDRESS 1 ADDRESS 2
1 SMALL H
2 ZITT M
3 SMITH E SOME ADDRESS 5 SOME ADDRESS 3
4 GLANZEL W
5 HUANG MH
6 THIJS B
这是一个tidyverse
解决方案。我首先通过将条目拆分为单独的名称来清理第二个 table。我们可以使用 left_join
来匹配条目:
library(tidyverse)
df2_clean <- df2 %>%
mutate(name = str_split(name, ";")) %>%
unnest(name)
df1 %>%
left_join(df2_clean, by = c("NAME" = "name"))
#> NAME address
#> 1 SMALL H <NA>
#> 2 ZITT M <NA>
#> 3 SMITH E SOME ADDRESS 3
#> 4 SMITH E SOME ADDRESS 5
#> 5 GLANZEL W <NA>
#> 6 HUANG MH <NA>
#> 7 THIJS B <NA>
如果你真的想要,你可以将 Smith 的两个地址分成两列,但我建议在这里坚持使用长格式:
df1 %>%
left_join(df2_clean, by = c("NAME" = "name")) %>%
group_by(NAME) %>%
mutate(add_c = row_number()) %>%
pivot_wider(id_cols = NAME, names_from = add_c, names_prefix = "address_", values_from = address)
#> # A tibble: 6 x 3
#> # Groups: NAME [6]
#> NAME address_1 address_2
#> <chr> <chr> <chr>
#> 1 SMALL H <NA> <NA>
#> 2 ZITT M <NA> <NA>
#> 3 SMITH E SOME ADDRESS 3 SOME ADDRESS 5
#> 4 GLANZEL W <NA> <NA>
#> 5 HUANG MH <NA> <NA>
#> 6 THIJS B <NA> <NA>
数据
df1 <- read.delim(text = "NAME
SMALL H
ZITT M
SMITH E
GLANZEL W
HUANG MH
THIJS B", stringsAsFactors = FALSE)
df2 <- read.delim(text = "name,address
SIBLEY B,SOME ADDRESS 1
STEWART C;KOCH A,SOME ADDRESS 2
HILL GM;LEE A;SMITH E,SOME ADDRESS 3
DAVIS L,SOME ADDRESS 4
MERCIER K;SMITH E;GIBBONE A,SOME ADDRESS 5
DAVIDSON S;BEKIARI A,SOME ADDRESS 6", sep = ",", stringsAsFactors = FALSE)
我有两个数据框:
1,
NAME
1 SMALL H
2 ZITT M
3 SMITH E
4 GLANZEL W
5 HUANG MH
6 THIJS B
和 2,
name address
SIBLEY B SOME ADDRESS 1
STEWART C;KOCH A SOME ADDRESS 2
HILL GM;LEE A;SMITH E SOME ADDRESS 3
DAVIS L SOME ADDRESS 4
MERCIER K;SMITH E;GIBBONE A SOME ADDRESS 5
DAVIDSON S;BEKIARI A SOME ADDRESS 6
我希望能够将第一个 table 中的 NAME
与第二个 table 中的 name
字符串匹配的实例进行匹配,然后添加来自 ADDRESS
列的数据,有点像 vlookup。它还必须处理同名的多个实例。在上面的示例中,名称 SMITH E
(不同的人)将提供匹配项,给出以下结果:
NAME ADDRESS 1 ADDRESS 2
1 SMALL H
2 ZITT M
3 SMITH E SOME ADDRESS 5 SOME ADDRESS 3
4 GLANZEL W
5 HUANG MH
6 THIJS B
这是一个tidyverse
解决方案。我首先通过将条目拆分为单独的名称来清理第二个 table。我们可以使用 left_join
来匹配条目:
library(tidyverse)
df2_clean <- df2 %>%
mutate(name = str_split(name, ";")) %>%
unnest(name)
df1 %>%
left_join(df2_clean, by = c("NAME" = "name"))
#> NAME address
#> 1 SMALL H <NA>
#> 2 ZITT M <NA>
#> 3 SMITH E SOME ADDRESS 3
#> 4 SMITH E SOME ADDRESS 5
#> 5 GLANZEL W <NA>
#> 6 HUANG MH <NA>
#> 7 THIJS B <NA>
如果你真的想要,你可以将 Smith 的两个地址分成两列,但我建议在这里坚持使用长格式:
df1 %>%
left_join(df2_clean, by = c("NAME" = "name")) %>%
group_by(NAME) %>%
mutate(add_c = row_number()) %>%
pivot_wider(id_cols = NAME, names_from = add_c, names_prefix = "address_", values_from = address)
#> # A tibble: 6 x 3
#> # Groups: NAME [6]
#> NAME address_1 address_2
#> <chr> <chr> <chr>
#> 1 SMALL H <NA> <NA>
#> 2 ZITT M <NA> <NA>
#> 3 SMITH E SOME ADDRESS 3 SOME ADDRESS 5
#> 4 GLANZEL W <NA> <NA>
#> 5 HUANG MH <NA> <NA>
#> 6 THIJS B <NA> <NA>
数据
df1 <- read.delim(text = "NAME
SMALL H
ZITT M
SMITH E
GLANZEL W
HUANG MH
THIJS B", stringsAsFactors = FALSE)
df2 <- read.delim(text = "name,address
SIBLEY B,SOME ADDRESS 1
STEWART C;KOCH A,SOME ADDRESS 2
HILL GM;LEE A;SMITH E,SOME ADDRESS 3
DAVIS L,SOME ADDRESS 4
MERCIER K;SMITH E;GIBBONE A,SOME ADDRESS 5
DAVIDSON S;BEKIARI A,SOME ADDRESS 6", sep = ",", stringsAsFactors = FALSE)