使用具有不同列名的 stringdist_join

Using stringdist_join with differing column names

我有如下示例数据:

library(fuzzyjoin)
a <- data.frame(x = c("season", "season", "season", "package", "package"), y = c("1","2", "3", "1","6"))


b <- data.frame(x = c("season", "seson", "seson", "package", "pakkage"), w = c("1","2", "3", "2","6"))

c <- data.frame(z = c("season", "seson", "seson", "package", "pakkage"), w = c("1","2", "3", "2","6"))

因此以下运行正常:

d <- stringdist_left_join(a,b, by = "x", max_dist = 2)

但不允许与具有不同名称的列合并(请注意现在连接是 ac)。

e <- stringdist_left_join(a,c, by = c("x", "z"), max_dist = 2)

我想告诉stringdist_left_join使用两个不同的列名来连接,就像最后一行代码(e),但它似乎不接受。

是否有任何解决方案(除了复制该列并为其重新命名)?

您可以对两个不同的列名称使用 =。您可以使用以下代码:

e <- stringdist_left_join(a,c, by = c("x" = "z"), max_dist = 2)

输出:

         x y       z w
1   season 1  season 1
2   season 1   seson 2
3   season 1   seson 3
4   season 2  season 1
5   season 2   seson 2
6   season 2   seson 3
7   season 3  season 1
8   season 3   seson 2
9   season 3   seson 3
10 package 1 package 2
11 package 1 pakkage 6
12 package 6 package 2
13 package 6 pakkage 6