根据另一个 table 中的匹配项创建新列

Create new column based on matches in another table

我有 2 个看起来像这样的数据框(这是一个计数 table)。 data1 有一个名为“方法”的列,我想将其添加到 data2 中。我只想根据 Col1、Col2、Col3 来匹配它(Count 列不需要匹配)。如果匹配,就从data1中取出Method,放到data2中。如果没有匹配项,则将值设置为“未确定”。我在下面的数据框中找到了一个示例,名为 data_final.

data1 <- data.frame("Col1" = c("ABC", "ABC", "EFG", "XYZ"), "Col2" = c("AA", "AA", 
"AA", "BB"), "Col3" = c("Al", "B", "Al", "Al"), "Count" = c(1, 4, 6, 2), "Method" = 
c("Sample", "Dry", "Sample", "Sample"))

data2 <- data.frame("Col1" = c("ABC", "ABC", "ABC", "EFG", "XYZ", "XYZ"), "Col2" = 
c("AA", "AA","CC", "AA", "BB", "CC"), "Col3" = c("Al", "B", "C", "Al", "Al", "C"), 
"Count" = c(1, 4, 5, 6, 2, 1))

我想创建一个新的数据框,看起来像我上面描述的那样:

data_final <- data.frame("Col1" = c("ABC", "ABC", "ABC", "EFG", "XYZ", "XYZ"), "Col2" 
= c("AA", "AA","CC", "AA", "BB", "CC"), "Col3" = c("Al", "B", "C", "Al", "Al", "C"), 
"Count" = c(1, 4, 5, 6, 2, 1), "Method" = c("Sample", "Dry", "Not Determined", 
"Sample", "Sample", "Not Determined"))

感谢您的帮助!

在 Base R 中,您可以进行合并:

data3 <- merge( data1, data2, all.y = TRUE )

然后用您选择的字符串替换 NA:

data3[ is.na( data3[ 5 ] ), 5 ] <- "Not Determined"

这给了你

> data3
  Col1 Col2 Col3 Count         Method
1  ABC   AA   Al     1         Sample
2  ABC   AA    B     4            Dry
3  ABC   CC    C     5 Not Determined
4  EFG   AA   Al     6         Sample
5  XYZ   BB   Al     2         Sample
6  XYZ   CC    C     1 Not Determined

注意:如果您使用的是旧版本的 R (< 4.0),您可能正在处理因子,需要在 with

之前添加额外的因子水平
levels( data3$Method ) <- c( levels( data3$Method ), "Not Determined" )