R根据两个不同数据框中的匹配行值填充列

R Populate column based on matching rows values in two different data frames

我有两个不同的数据框 'df1' 和 'df2',有六个匹配的列名。我想在 df2 中扫描与 df1 完全匹配的行,如果它们在 df1 的 'detect' 列中输入 1,如果不在该列中输入 0。目前 df1 中 'detect' 的所有值都是 0,但我希望当两个数据帧完全匹配时将它们更改为 1。它看起来像这样:

df1

site ddate ssegment spp vtype tperiod detect
BMA 6/1/2021 1 AMRO Song 1 0
BMC 6/15/2021 1 WISN Drum 1 0
BMA 6/15/2021 1 NOFL Song 2 0
BMC 6/29/2021 2 AMRO Call 1 0
BMA 6/29/2021 2 WISN Call 2 0

df2

site ddate ssegment spp vtype tperiod
BMA 6/1/2021 1 AMRO Call 1
BMC 6/15/2021 1 WISN Drum 1
BMA 6/15/2021 1 NOFL Song 2
BMC 6/29/2021 2 AMRO Drum 1
BMA 6/29/2021 2 WISN Call 2

扫描完这些后,df1 现在看起来像:

df1

site ddate ssegment spp vtype tperiod detect
BMA 6/1/2021 1 AMRO Song 1 0
BMC 6/15/2021 1 WISN Drum 1 1
BMA 6/15/2021 1 NOFL Song 2 1
BMC 6/29/2021 2 AMRO Call 1 0
BMA 6/29/2021 2 WISN Call 2 1

我在想 R 基函数 'merge' 可能会有用,但我不太明白。感谢您的帮助!

仅在df2中从detect列开始,然后合并:

df1$detect = NULL
df2$detect = 1
result = merge(df1, unique(df2), all.x = TRUE)

这将在完全匹配时将 detect 列创建为 1,在没有完全匹配时将其创建为 NA。如果需要,可以将 NAs 更改为 0s。

同样的方法也适用于dplyr:

library(dplyr)
df1 %>% 
  select(-detect) %>%
  left_join(
    df2 %>% mutate(detect = 1) %>% unique)
  )

anti_joinsemi_join两个表的过滤连接:

library(tidyverse)

df1 <- tribble(
  ~site,      ~ddate, ~ssegment,   ~spp, ~vtype, ~tperiod, ~detect,
  "BMA",  "6/1/2021",        1L, "AMRO", "Song",       1L,      0L,
  "BMC", "6/15/2021",        1L, "WISN", "Drum",       1L,      0L,
  "BMA", "6/15/2021",        1L, "NOFL", "Song",       2L,      0L,
  "BMC", "6/29/2021",        2L, "AMRO", "Call",       1L,      0L,
  "BMA", "6/29/2021",        2L, "WISN", "Call",       2L,      0L
  )

df2 <- tibble::tribble(
~site,      ~ddate, ~ssegment,   ~spp, ~vtype, ~tperiod,
"BMA",  "6/1/2021",        1L, "AMRO", "Call",       1L,
"BMC", "6/15/2021",        1L, "WISN", "Drum",       1L,
"BMA", "6/15/2021",        1L, "NOFL", "Song",       2L,
"BMC", "6/29/2021",        2L, "AMRO", "Drum",       1L,
"BMA", "6/29/2021",        2L, "WISN", "Call",       2L
)


bind_rows(
  df1 %>% select(-detect) %>% anti_join(df2) %>% mutate(detect = 0),
  df1 %>% select(-detect) %>% semi_join(df2) %>% mutate(detect = 1)
)
#> Joining, by = c("site", "ddate", "ssegment", "spp", "vtype", "tperiod")
#> Joining, by = c("site", "ddate", "ssegment", "spp", "vtype", "tperiod")
#> # A tibble: 5 x 7
#>   site  ddate     ssegment spp   vtype tperiod detect
#>   <chr> <chr>        <int> <chr> <chr>   <int>  <dbl>
#> 1 BMA   6/1/2021         1 AMRO  Song        1      0
#> 2 BMC   6/29/2021        2 AMRO  Call        1      0
#> 3 BMC   6/15/2021        1 WISN  Drum        1      1
#> 4 BMA   6/15/2021        1 NOFL  Song        2      1
#> 5 BMA   6/29/2021        2 WISN  Call        2      1

reprex package (v2.0.1)

于 2021-12-08 创建

请使用 data.table

找到一种可能且非常简单的解决方案

Reprex

  • 代码
library(data.table)

setDT(df1)
setDT(df2)

df1[df2, on = .(site, ddate, ssegment, spp, vtype, tperiod), detect := TRUE][]
  • 输出

#>    site     ddate ssegment  spp vtype tperiod detect
#> 1:  BMA  6/1/2021        1 AMRO  Song       1      0
#> 2:  BMC 6/15/2021        1 WISN  Drum       1      1
#> 3:  BMA 6/15/2021        1 NOFL  Song       2      1
#> 4:  BMC 6/29/2021        2 AMRO  Call       1      0
#> 5:  BMA 6/29/2021        2 WISN  Call       2      1

reprex package (v2.0.1)

于 2021-12-08 创建