
Find matching elements of two dataframes and assign new value


   ss clusters  ICC items CondNum
1 300       10 0.05    10       1
2 300       10 0.05    20       2
3 300       10 0.05    50       3
4 300       10 0.05    70       4
5 300       10 0.10    10       5

235 5000      150 0.3    50     235
236 5000      150 0.3    70     236
237 5000      150 0.4    10     237
238 5000      150 0.4    20     238
239 5000      150 0.4    50     239
240 5000      150 0.40   70     240

它有 240 行和 5 列。条件数据框有 60,000 行和与设计数据框相同的前 4 列。条件 df 的每一行与设计 df 中的一行匹配(不包括最后一列)。我想将设计 df 中的 CondNum 分配给条件数据框中的匹配行。例如,条件数据如下所示:

    ss clusters   ICC items
1 1000       10 0.053    10
2 1000       10 0.053    10
3 1000       10 0.053    10
4 300        10  0.10    20
51,998 5000      100 0.4    20     
51,999 5000      100 0.4    20    


    ss clusters   ICC items CondNum
1 1000       10 0.053    10     108
2 1000       10 0.053    10     108
3 1000       10 0.053    10     108
4 300        10  0.10    20       2
51,998 5000      100 0.4    20   210 
51,999 5000      100 0.4    20   210 


您可以使用 dplyr 中的 left_join:

> design
#   ss clusters  ICC items CondNum
#1 300       10 0.05    10       1
#2 300       10 0.05    20       2
#3 300       10 0.05    50       3
#4 300       10 0.05    70       4
#5 300       10 0.10    10       5

> condition
#    ss clusters  ICC items
#1  300       10 0.05    20
#2  300       10 0.05    50
#3 1000       10 0.05    70
#4  300       10 0.10    10

> dplyr::left_join(condition, design)
#Joining by: c("ss", "clusters", "ICC", "items")
#    ss clusters  ICC items CondNum
#1  300       10 0.05    20       2
#2  300       10 0.05    50       3
#3 1000       10 0.05    70      NA
#4  300       10 0.10    10       5

或者按照评论中提到的,您可以使用来自基础 R:

> merge(condition, design, all.x = TRUE)
#    ss clusters  ICC items CondNum
#1  300       10 0.05    20       2
#2  300       10 0.05    50       3
#3  300       10 0.10    10       5
#4 1000       10 0.05    70      NA



## design
design <- structure(list(ss = c(300L, 300L, 300L, 300L, 300L), clusters = c(10L, 
10L, 10L, 10L, 10L), ICC = c(0.05, 0.05, 0.05, 0.05, 0.1), items = c(10L, 
20L, 50L, 70L, 10L), CondNum = 1:5), .Names = c("ss", "clusters", 
"ICC", "items", "CondNum"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5"))

## condition
condition <- structure(list(ss = c(300L, 300L, 1000L, 300L), clusters = c(10L, 
10L, 10L, 10L), ICC = c(0.05, 0.05, 0.05, 0.1), items = c(20L, 
50L, 70L, 10L)), .Names = c("ss", "clusters", "ICC", "items"), 
class = "data.frame", row.names = c("1", "2", "3", "4"))