在 R 中按组查找不同数据集中的匹配值

Find matching values in different datasets by groups in R

我有以下两个数据集:

df1 <- data.frame(
  "group" = c(1, 1, 2, 2, 2, 3, 3, 4, 5, 5), 
  "numbers" = c(55, 75, 60, 65, 32, 33, 55, 43, 75, 70),
  "class" = c("b", "b", "a", "c", "a", "b", "a", "c", "a", "b"))
df2 <- data.frame(
  "group" = c(1, 1, 2, 2, 2, 3, 3, 4, 5), 
  "P1" = c(55, NA, 60, 65, NA, NA, 55, 43, 60),
  "P2" = c(55, 75, 65, 60, NA, 33, 55, NA, 75),
  "P3" = c(75, 55, 60, 65, 33, 32, 43, 55, 70))

我想检查df1中“numbers”列中的值是否包含在df2的P1、P2和P3列中。我遇到了两个问题。 1. df1的numbers列中的值可以出现在df2的不同组中(由df1和df2中的group列定义)。 2.数据集长度不同。有没有办法合并两个数据集并具有以下数据集:

df3 <- data.frame(
  "group" = c(1, 1, 2, 2, 2, 3, 3, 4, 5, 5), 
  "number" = c(55, 75, 60, 65, 32, 33, 55, 43, 75, 70),
  "class" = c("b", "b", "a", "c", "a", "b", "a", "c", "a", "b"),
  "P1new"    = c(1, 0, 1, 1, 0, 0, 1, 1, 0, 0),
  "P2new"    = c(1, 1, 1, 1, 0, 1, 1, 0, 1, 0),
  "P3new"    = c(1, 1, 1, 1, 1, 1, 1, 1, 0, 1)))

如果 df2$P1 包含正确组内 df1$numbers 中的值,则 P1new(分别为 P2new 和 P3new)包含值 1(如我所说,数字可以在不同的组中重复出现)。例如,P3 在第 1 组中的值为 75,但在第 5 组中没有。因此,在第 1 组中,P3new 的值为 1,而在第 5 组中的 P3new 值为 0。 如果有任何帮助,我将不胜感激。

这是一个data.table方法。 有一些不完整的情况(class 没有出现在 df1 中)。如果你愿意,你可以轻松地放下它们...

library(data.table)
# Convert to data.table
setDT(df1); setDT(df2)
# Melt df2 to long format
df2.melt <- melt(df2, id.vars = "group", na.rm = TRUE)
# Join classes from df1 to df2.melt
df2.melt[df1, class := i.class, on = .(group, value = numbers)][]
# Cast back to wide format
dcast(df2.melt, group + value + class ~ paste0(variable, "New"), fun.aggregate = length)
#    group value class P1New P2New P3New
# 1:     1    55     b     1     1     1
# 2:     1    75     b     0     1     1
# 3:     2    33  <NA>     0     0     1
# 4:     2    60     a     1     1     1
# 5:     2    65     c     1     1     1
# 6:     3    32  <NA>     0     0     1
# 7:     3    33     b     0     1     0
# 8:     3    43  <NA>     0     0     1
# 9:     3    55     a     1     1     0
#10:     4    43     c     1     0     0
#11:     4    55  <NA>     0     0     1
#12:     5    60  <NA>     1     0     0
#13:     5    70     b     0     0     1
#14:     5    75     a     0     1     0

根据下面的评论添加

dcast(df2.melt, group + value + class ~ paste0(variable, "New"), value.var = "value")
#    group value class P1New P2New P3New
# 1:     1    55     b    55    55    55
# 2:     1    75     b    NA    75    75
# 3:     2    33  <NA>    NA    NA    33
# 4:     2    60     a    60    60    60
# 5:     2    65     c    65    65    65
# 6:     3    32  <NA>    NA    NA    32
# 7:     3    33     b    NA    33    NA
# 8:     3    43  <NA>    NA    NA    43
# 9:     3    55     a    55    55    NA
#10:     4    43     c    43    NA    NA
#11:     4    55  <NA>    NA    NA    55
#12:     5    60  <NA>    60    NA    NA
#13:     5    70     b    NA    NA    70
#14:     5    75     a    NA    75    NA