在 R 中按组查找不同数据集中的匹配值
Find matching values in different datasets by groups in R
我有以下两个数据集:
df1 <- data.frame(
"group" = c(1, 1, 2, 2, 2, 3, 3, 4, 5, 5),
"numbers" = c(55, 75, 60, 65, 32, 33, 55, 43, 75, 70),
"class" = c("b", "b", "a", "c", "a", "b", "a", "c", "a", "b"))
df2 <- data.frame(
"group" = c(1, 1, 2, 2, 2, 3, 3, 4, 5),
"P1" = c(55, NA, 60, 65, NA, NA, 55, 43, 60),
"P2" = c(55, 75, 65, 60, NA, 33, 55, NA, 75),
"P3" = c(75, 55, 60, 65, 33, 32, 43, 55, 70))
我想检查df1中“numbers”列中的值是否包含在df2的P1、P2和P3列中。我遇到了两个问题。 1. df1的numbers列中的值可以出现在df2的不同组中(由df1和df2中的group列定义)。 2.数据集长度不同。有没有办法合并两个数据集并具有以下数据集:
df3 <- data.frame(
"group" = c(1, 1, 2, 2, 2, 3, 3, 4, 5, 5),
"number" = c(55, 75, 60, 65, 32, 33, 55, 43, 75, 70),
"class" = c("b", "b", "a", "c", "a", "b", "a", "c", "a", "b"),
"P1new" = c(1, 0, 1, 1, 0, 0, 1, 1, 0, 0),
"P2new" = c(1, 1, 1, 1, 0, 1, 1, 0, 1, 0),
"P3new" = c(1, 1, 1, 1, 1, 1, 1, 1, 0, 1)))
如果 df2$P1 包含正确组内 df1$numbers 中的值,则 P1new(分别为 P2new 和 P3new)包含值 1(如我所说,数字可以在不同的组中重复出现)。例如,P3 在第 1 组中的值为 75,但在第 5 组中没有。因此,在第 1 组中,P3new 的值为 1,而在第 5 组中的 P3new 值为 0。
如果有任何帮助,我将不胜感激。
这是一个data.table
方法。
有一些不完整的情况(class 没有出现在 df1 中)。如果你愿意,你可以轻松地放下它们...
library(data.table)
# Convert to data.table
setDT(df1); setDT(df2)
# Melt df2 to long format
df2.melt <- melt(df2, id.vars = "group", na.rm = TRUE)
# Join classes from df1 to df2.melt
df2.melt[df1, class := i.class, on = .(group, value = numbers)][]
# Cast back to wide format
dcast(df2.melt, group + value + class ~ paste0(variable, "New"), fun.aggregate = length)
# group value class P1New P2New P3New
# 1: 1 55 b 1 1 1
# 2: 1 75 b 0 1 1
# 3: 2 33 <NA> 0 0 1
# 4: 2 60 a 1 1 1
# 5: 2 65 c 1 1 1
# 6: 3 32 <NA> 0 0 1
# 7: 3 33 b 0 1 0
# 8: 3 43 <NA> 0 0 1
# 9: 3 55 a 1 1 0
#10: 4 43 c 1 0 0
#11: 4 55 <NA> 0 0 1
#12: 5 60 <NA> 1 0 0
#13: 5 70 b 0 0 1
#14: 5 75 a 0 1 0
根据下面的评论添加
dcast(df2.melt, group + value + class ~ paste0(variable, "New"), value.var = "value")
# group value class P1New P2New P3New
# 1: 1 55 b 55 55 55
# 2: 1 75 b NA 75 75
# 3: 2 33 <NA> NA NA 33
# 4: 2 60 a 60 60 60
# 5: 2 65 c 65 65 65
# 6: 3 32 <NA> NA NA 32
# 7: 3 33 b NA 33 NA
# 8: 3 43 <NA> NA NA 43
# 9: 3 55 a 55 55 NA
#10: 4 43 c 43 NA NA
#11: 4 55 <NA> NA NA 55
#12: 5 60 <NA> 60 NA NA
#13: 5 70 b NA NA 70
#14: 5 75 a NA 75 NA
我有以下两个数据集:
df1 <- data.frame(
"group" = c(1, 1, 2, 2, 2, 3, 3, 4, 5, 5),
"numbers" = c(55, 75, 60, 65, 32, 33, 55, 43, 75, 70),
"class" = c("b", "b", "a", "c", "a", "b", "a", "c", "a", "b"))
df2 <- data.frame(
"group" = c(1, 1, 2, 2, 2, 3, 3, 4, 5),
"P1" = c(55, NA, 60, 65, NA, NA, 55, 43, 60),
"P2" = c(55, 75, 65, 60, NA, 33, 55, NA, 75),
"P3" = c(75, 55, 60, 65, 33, 32, 43, 55, 70))
我想检查df1中“numbers”列中的值是否包含在df2的P1、P2和P3列中。我遇到了两个问题。 1. df1的numbers列中的值可以出现在df2的不同组中(由df1和df2中的group列定义)。 2.数据集长度不同。有没有办法合并两个数据集并具有以下数据集:
df3 <- data.frame(
"group" = c(1, 1, 2, 2, 2, 3, 3, 4, 5, 5),
"number" = c(55, 75, 60, 65, 32, 33, 55, 43, 75, 70),
"class" = c("b", "b", "a", "c", "a", "b", "a", "c", "a", "b"),
"P1new" = c(1, 0, 1, 1, 0, 0, 1, 1, 0, 0),
"P2new" = c(1, 1, 1, 1, 0, 1, 1, 0, 1, 0),
"P3new" = c(1, 1, 1, 1, 1, 1, 1, 1, 0, 1)))
如果 df2$P1 包含正确组内 df1$numbers 中的值,则 P1new(分别为 P2new 和 P3new)包含值 1(如我所说,数字可以在不同的组中重复出现)。例如,P3 在第 1 组中的值为 75,但在第 5 组中没有。因此,在第 1 组中,P3new 的值为 1,而在第 5 组中的 P3new 值为 0。 如果有任何帮助,我将不胜感激。
这是一个data.table
方法。
有一些不完整的情况(class 没有出现在 df1 中)。如果你愿意,你可以轻松地放下它们...
library(data.table)
# Convert to data.table
setDT(df1); setDT(df2)
# Melt df2 to long format
df2.melt <- melt(df2, id.vars = "group", na.rm = TRUE)
# Join classes from df1 to df2.melt
df2.melt[df1, class := i.class, on = .(group, value = numbers)][]
# Cast back to wide format
dcast(df2.melt, group + value + class ~ paste0(variable, "New"), fun.aggregate = length)
# group value class P1New P2New P3New
# 1: 1 55 b 1 1 1
# 2: 1 75 b 0 1 1
# 3: 2 33 <NA> 0 0 1
# 4: 2 60 a 1 1 1
# 5: 2 65 c 1 1 1
# 6: 3 32 <NA> 0 0 1
# 7: 3 33 b 0 1 0
# 8: 3 43 <NA> 0 0 1
# 9: 3 55 a 1 1 0
#10: 4 43 c 1 0 0
#11: 4 55 <NA> 0 0 1
#12: 5 60 <NA> 1 0 0
#13: 5 70 b 0 0 1
#14: 5 75 a 0 1 0
根据下面的评论添加
dcast(df2.melt, group + value + class ~ paste0(variable, "New"), value.var = "value")
# group value class P1New P2New P3New
# 1: 1 55 b 55 55 55
# 2: 1 75 b NA 75 75
# 3: 2 33 <NA> NA NA 33
# 4: 2 60 a 60 60 60
# 5: 2 65 c 65 65 65
# 6: 3 32 <NA> NA NA 32
# 7: 3 33 b NA 33 NA
# 8: 3 43 <NA> NA NA 43
# 9: 3 55 a 55 55 NA
#10: 4 43 c 43 NA NA
#11: 4 55 <NA> NA NA 55
#12: 5 60 <NA> 60 NA NA
#13: 5 70 b NA NA 70
#14: 5 75 a NA 75 NA