R:根据满足的两个连接条件合并两个数据帧

R: Merge two data frames based on two joining conditions being met

我有调查数据 - 称之为 survey - 一组人回答了问题。我有每个人的名字,他们回答的问题和他们的回答,都是长格式的(每个人的名字重复了几十次,每个问题重复一次)。

员工姓名 |问题 |回答

在第二个数据框中 - 称之为 metaData -,我有关于问题子集的额外数据

员工姓名 |问题 |问题评价 |问题学习计划|等等

两个数据集共享 Employee Name 和 Question 列,它们应该完全匹配。

我需要merge()这两个数据框,但是Employee Name和Question都不足以合并。当您组合问题和员工姓名时,这是一个唯一的 ID。在伪代码中,merge(survey, metaData, where(employeeSurvey == employeeMeta && questionSurvey == questionMeta).

例如,仅合并员工姓名会 return 数百个匹配项,但应该只有一个员工姓名和问题相同。

如何根据这两个条件进行合并?

您应该能够将它们放入像

这样的向量中
survey<-data.frame(name=c("John","John","Jane","Jane"), question=c(1,2,1,2),answer=c("Yes","Yes","Yes", "No"),stringsAsFactors = F)

metaData<-data.frame(first=c("John","John","Jane","Jane"), quest=c(1,2,1,2), age=c("20","20","40", "40"), stringsAsFactors = F)

merge(survey,metaData, by.x=c('name','question'), by.y=c('first','quest'))

  name question answer age
1 Jane        1    Yes  40
2 Jane        2     No  40
3 John        1    Yes  20
4 John        2    Yes  20

与 dplyr 包合并

survey<-data.frame(name=c("John","John","Jane","Jane"), question=c(1,2,1,2),answer=c("Yes","Yes","Yes", "No"),stringsAsFactors = F)

metaData<-data.frame(first=c("John","John","Jane","Jane"), quest=c(1,2,1,2), age=c("20","20","40", "40"), stringsAsFactors = F)

library(dplyr)
left_join(survey, metaData, by = c(name = "first", question = "quest"))

# or using the pipe
survey %>% 
   left_join(metaData, by = c(name = "first", question = "quest"))

你还有另外两个表的动词,和sql的逻辑一样:inner_join, right_join 和 full_join.