R根据两个不同数据框中的匹配行值填充列

Question

我有两个不同的数据框 'df1' 和 'df2'，有六个匹配的列名。我想在 df2 中扫描与 df1 完全匹配的行，如果它们在 df1 的 'detect' 列中输入 1，如果不在该列中输入 0。目前 df1 中 'detect' 的所有值都是 0，但我希望当两个数据帧完全匹配时将它们更改为 1。它看起来像这样：

df1

site	ddate	ssegment	spp	vtype	tperiod
BMA	6/1/2021	1	AMRO	Song	1
BMC	6/15/2021	1	WISN	Drum	1
BMA	6/15/2021	1	NOFL	Song	2
BMC	6/29/2021	2	AMRO	Call	1
BMA	6/29/2021	2	WISN	Call	2

df2

site	ddate	ssegment	spp	vtype	tperiod
BMA	6/1/2021	1	AMRO	Call	1
BMC	6/15/2021	1	WISN	Drum	1
BMA	6/15/2021	1	NOFL	Song	2
BMC	6/29/2021	2	AMRO	Drum	1
BMA	6/29/2021	2	WISN	Call	2

扫描完这些后，df1 现在看起来像：

df1

site	ddate	ssegment	spp	vtype	tperiod	detect
BMA	6/1/2021	1	AMRO	Song	1	0
BMC	6/15/2021	1	WISN	Drum	1	1
BMA	6/15/2021	1	NOFL	Song	2	1
BMC	6/29/2021	2	AMRO	Call	1	0
BMA	6/29/2021	2	WISN	Call	2	1

我在想 R 基函数 'merge' 可能会有用，但我不太明白。感谢您的帮助！

Answer 1

仅在df2中从detect列开始，然后合并：

df1$detect = NULL
df2$detect = 1
result = merge(df1, unique(df2), all.x = TRUE)

这将在完全匹配时将 detect 列创建为 1，在没有完全匹配时将其创建为 NA。如果需要，可以将 NAs 更改为 0s。

同样的方法也适用于dplyr:

library(dplyr)
df1 %>% 
  select(-detect) %>%
  left_join(
    df2 %>% mutate(detect = 1) %>% unique)
  )

Answer 2

有anti_join和semi_join两个表的过滤连接：

library(tidyverse)

df1 <- tribble(
  ~site,      ~ddate, ~ssegment,   ~spp, ~vtype, ~tperiod, ~detect,
  "BMA",  "6/1/2021",        1L, "AMRO", "Song",       1L,      0L,
  "BMC", "6/15/2021",        1L, "WISN", "Drum",       1L,      0L,
  "BMA", "6/15/2021",        1L, "NOFL", "Song",       2L,      0L,
  "BMC", "6/29/2021",        2L, "AMRO", "Call",       1L,      0L,
  "BMA", "6/29/2021",        2L, "WISN", "Call",       2L,      0L
  )

df2 <- tibble::tribble(
~site,      ~ddate, ~ssegment,   ~spp, ~vtype, ~tperiod,
"BMA",  "6/1/2021",        1L, "AMRO", "Call",       1L,
"BMC", "6/15/2021",        1L, "WISN", "Drum",       1L,
"BMA", "6/15/2021",        1L, "NOFL", "Song",       2L,
"BMC", "6/29/2021",        2L, "AMRO", "Drum",       1L,
"BMA", "6/29/2021",        2L, "WISN", "Call",       2L
)


bind_rows(
  df1 %>% select(-detect) %>% anti_join(df2) %>% mutate(detect = 0),
  df1 %>% select(-detect) %>% semi_join(df2) %>% mutate(detect = 1)
)
#> Joining, by = c("site", "ddate", "ssegment", "spp", "vtype", "tperiod")
#> Joining, by = c("site", "ddate", "ssegment", "spp", "vtype", "tperiod")
#> # A tibble: 5 x 7
#>   site  ddate     ssegment spp   vtype tperiod detect
#>   <chr> <chr>        <int> <chr> <chr>   <int>  <dbl>
#> 1 BMA   6/1/2021         1 AMRO  Song        1      0
#> 2 BMC   6/29/2021        2 AMRO  Call        1      0
#> 3 BMC   6/15/2021        1 WISN  Drum        1      1
#> 4 BMA   6/15/2021        1 NOFL  Song        2      1
#> 5 BMA   6/29/2021        2 WISN  Call        2      1

^{由 reprex package (v2.0.1)}

于 2021-12-08 创建

Answer 3

请使用 data.table 库

找到一种可能且非常简单的解决方案

Reprex

代码

library(data.table)

setDT(df1)
setDT(df2)

df1[df2, on = .(site, ddate, ssegment, spp, vtype, tperiod), detect := TRUE][]

输出


#>    site     ddate ssegment  spp vtype tperiod detect
#> 1:  BMA  6/1/2021        1 AMRO  Song       1      0
#> 2:  BMC 6/15/2021        1 WISN  Drum       1      1
#> 3:  BMA 6/15/2021        1 NOFL  Song       2      1
#> 4:  BMC 6/29/2021        2 AMRO  Call       1      0
#> 5:  BMA 6/29/2021        2 WISN  Call       2      1

^{由 reprex package (v2.0.1)}

于 2021-12-08 创建

R根据两个不同数据框中的匹配行值填充列

R Populate column based on matching rows values in two different data frames

merge

r