查找两个数据帧中匹配的两列,并使用 R 将数据帧 2 中的第三列放入数据帧 1 中的新列
Find two columns that match in two dataframes and put third column from dataframe 2 into a new column in dataframe 1 using R
我有 2 个数据框:
df1:
word1 previousWord
a na
b a
c b
另一个数据框看起来像这样
df2: this contains more pairs than exist in df1. It contains every combo possible
word1 previousWord Score
a a 1
a b .5
a c .9
b a .5
b b 1
b c .2
c a .9
c b .2
c c 1
我想在 df2 中找到 df1 中的对(即 b-a、c-b),然后从 df2 中复制分数并将其添加到 df1 中的新列中。
例如,输出如下所示:
word1 previousWord Score
a na na
b a .5
c b .2
这是我尝试过的方法,但它似乎从 df1 中删除了我的很多数据。调换顺序并没有解决这个问题。
df3<-merge(df2, df1, by = c("word1", "previousWord"))
非常感谢任何帮助。
您可以在此处使用 dplyr
中的 left_join()
。
library(dplyr)
df3<- left_join(df1, df2, by = c("word1", "previousWord"))
输出
word1 previousWord Score
1 a <NA> NA
2 b a 0.5
3 c b 0.2
数据
df1 <- structure(list(word1 = c("a", "b", "c"), previousWord = c(NA,
"a", "b")), class = "data.frame", row.names = c(NA, -3L))
df2 <- structure(list(word1 = c("a", "a", "a", "b", "b", "b", "c", "c",
"c"), previousWord = c("a", "b", "c", "a", "b", "c", "a", "b",
"c"), Score = c(1, 0.5, 0.9, 0.5, 1, 0.2, 0.9, 0.2, 1)), class = "data.frame", row.names = c(NA,
-9L))
我有 2 个数据框:
df1:
word1 previousWord
a na
b a
c b
另一个数据框看起来像这样
df2: this contains more pairs than exist in df1. It contains every combo possible
word1 previousWord Score
a a 1
a b .5
a c .9
b a .5
b b 1
b c .2
c a .9
c b .2
c c 1
我想在 df2 中找到 df1 中的对(即 b-a、c-b),然后从 df2 中复制分数并将其添加到 df1 中的新列中。
例如,输出如下所示:
word1 previousWord Score
a na na
b a .5
c b .2
这是我尝试过的方法,但它似乎从 df1 中删除了我的很多数据。调换顺序并没有解决这个问题。
df3<-merge(df2, df1, by = c("word1", "previousWord"))
非常感谢任何帮助。
您可以在此处使用 dplyr
中的 left_join()
。
library(dplyr)
df3<- left_join(df1, df2, by = c("word1", "previousWord"))
输出
word1 previousWord Score
1 a <NA> NA
2 b a 0.5
3 c b 0.2
数据
df1 <- structure(list(word1 = c("a", "b", "c"), previousWord = c(NA,
"a", "b")), class = "data.frame", row.names = c(NA, -3L))
df2 <- structure(list(word1 = c("a", "a", "a", "b", "b", "b", "c", "c",
"c"), previousWord = c("a", "b", "c", "a", "b", "c", "a", "b",
"c"), Score = c(1, 0.5, 0.9, 0.5, 1, 0.2, 0.9, 0.2, 1)), class = "data.frame", row.names = c(NA,
-9L))