R根据两个不同数据框中的匹配行值填充列
R Populate column based on matching rows values in two different data frames
我有两个不同的数据框 'df1' 和 'df2',有六个匹配的列名。我想在 df2 中扫描与 df1 完全匹配的行,如果它们在 df1 的 'detect' 列中输入 1,如果不在该列中输入 0。目前 df1 中 'detect' 的所有值都是 0,但我希望当两个数据帧完全匹配时将它们更改为 1。它看起来像这样:
df1
site
ddate
ssegment
spp
vtype
tperiod
detect
BMA
6/1/2021
1
AMRO
Song
1
0
BMC
6/15/2021
1
WISN
Drum
1
0
BMA
6/15/2021
1
NOFL
Song
2
0
BMC
6/29/2021
2
AMRO
Call
1
0
BMA
6/29/2021
2
WISN
Call
2
0
df2
site
ddate
ssegment
spp
vtype
tperiod
BMA
6/1/2021
1
AMRO
Call
1
BMC
6/15/2021
1
WISN
Drum
1
BMA
6/15/2021
1
NOFL
Song
2
BMC
6/29/2021
2
AMRO
Drum
1
BMA
6/29/2021
2
WISN
Call
2
扫描完这些后,df1 现在看起来像:
df1
site
ddate
ssegment
spp
vtype
tperiod
detect
BMA
6/1/2021
1
AMRO
Song
1
0
BMC
6/15/2021
1
WISN
Drum
1
1
BMA
6/15/2021
1
NOFL
Song
2
1
BMC
6/29/2021
2
AMRO
Call
1
0
BMA
6/29/2021
2
WISN
Call
2
1
我在想 R 基函数 'merge' 可能会有用,但我不太明白。感谢您的帮助!
仅在df2
中从detect
列开始,然后合并:
df1$detect = NULL
df2$detect = 1
result = merge(df1, unique(df2), all.x = TRUE)
这将在完全匹配时将 detect
列创建为 1,在没有完全匹配时将其创建为 NA
。如果需要,可以将 NA
s 更改为 0s。
同样的方法也适用于dplyr
:
library(dplyr)
df1 %>%
select(-detect) %>%
left_join(
df2 %>% mutate(detect = 1) %>% unique)
)
有anti_join
和semi_join
两个表的过滤连接:
library(tidyverse)
df1 <- tribble(
~site, ~ddate, ~ssegment, ~spp, ~vtype, ~tperiod, ~detect,
"BMA", "6/1/2021", 1L, "AMRO", "Song", 1L, 0L,
"BMC", "6/15/2021", 1L, "WISN", "Drum", 1L, 0L,
"BMA", "6/15/2021", 1L, "NOFL", "Song", 2L, 0L,
"BMC", "6/29/2021", 2L, "AMRO", "Call", 1L, 0L,
"BMA", "6/29/2021", 2L, "WISN", "Call", 2L, 0L
)
df2 <- tibble::tribble(
~site, ~ddate, ~ssegment, ~spp, ~vtype, ~tperiod,
"BMA", "6/1/2021", 1L, "AMRO", "Call", 1L,
"BMC", "6/15/2021", 1L, "WISN", "Drum", 1L,
"BMA", "6/15/2021", 1L, "NOFL", "Song", 2L,
"BMC", "6/29/2021", 2L, "AMRO", "Drum", 1L,
"BMA", "6/29/2021", 2L, "WISN", "Call", 2L
)
bind_rows(
df1 %>% select(-detect) %>% anti_join(df2) %>% mutate(detect = 0),
df1 %>% select(-detect) %>% semi_join(df2) %>% mutate(detect = 1)
)
#> Joining, by = c("site", "ddate", "ssegment", "spp", "vtype", "tperiod")
#> Joining, by = c("site", "ddate", "ssegment", "spp", "vtype", "tperiod")
#> # A tibble: 5 x 7
#> site ddate ssegment spp vtype tperiod detect
#> <chr> <chr> <int> <chr> <chr> <int> <dbl>
#> 1 BMA 6/1/2021 1 AMRO Song 1 0
#> 2 BMC 6/29/2021 2 AMRO Call 1 0
#> 3 BMC 6/15/2021 1 WISN Drum 1 1
#> 4 BMA 6/15/2021 1 NOFL Song 2 1
#> 5 BMA 6/29/2021 2 WISN Call 2 1
由 reprex package (v2.0.1)
于 2021-12-08 创建
请使用 data.table
库
找到一种可能且非常简单的解决方案
Reprex
- 代码
library(data.table)
setDT(df1)
setDT(df2)
df1[df2, on = .(site, ddate, ssegment, spp, vtype, tperiod), detect := TRUE][]
- 输出
#> site ddate ssegment spp vtype tperiod detect
#> 1: BMA 6/1/2021 1 AMRO Song 1 0
#> 2: BMC 6/15/2021 1 WISN Drum 1 1
#> 3: BMA 6/15/2021 1 NOFL Song 2 1
#> 4: BMC 6/29/2021 2 AMRO Call 1 0
#> 5: BMA 6/29/2021 2 WISN Call 2 1
由 reprex package (v2.0.1)
于 2021-12-08 创建
我有两个不同的数据框 'df1' 和 'df2',有六个匹配的列名。我想在 df2 中扫描与 df1 完全匹配的行,如果它们在 df1 的 'detect' 列中输入 1,如果不在该列中输入 0。目前 df1 中 'detect' 的所有值都是 0,但我希望当两个数据帧完全匹配时将它们更改为 1。它看起来像这样:
df1
site | ddate | ssegment | spp | vtype | tperiod | detect |
---|---|---|---|---|---|---|
BMA | 6/1/2021 | 1 | AMRO | Song | 1 | 0 |
BMC | 6/15/2021 | 1 | WISN | Drum | 1 | 0 |
BMA | 6/15/2021 | 1 | NOFL | Song | 2 | 0 |
BMC | 6/29/2021 | 2 | AMRO | Call | 1 | 0 |
BMA | 6/29/2021 | 2 | WISN | Call | 2 | 0 |
df2
site | ddate | ssegment | spp | vtype | tperiod |
---|---|---|---|---|---|
BMA | 6/1/2021 | 1 | AMRO | Call | 1 |
BMC | 6/15/2021 | 1 | WISN | Drum | 1 |
BMA | 6/15/2021 | 1 | NOFL | Song | 2 |
BMC | 6/29/2021 | 2 | AMRO | Drum | 1 |
BMA | 6/29/2021 | 2 | WISN | Call | 2 |
扫描完这些后,df1 现在看起来像:
df1
site | ddate | ssegment | spp | vtype | tperiod | detect |
---|---|---|---|---|---|---|
BMA | 6/1/2021 | 1 | AMRO | Song | 1 | 0 |
BMC | 6/15/2021 | 1 | WISN | Drum | 1 | 1 |
BMA | 6/15/2021 | 1 | NOFL | Song | 2 | 1 |
BMC | 6/29/2021 | 2 | AMRO | Call | 1 | 0 |
BMA | 6/29/2021 | 2 | WISN | Call | 2 | 1 |
我在想 R 基函数 'merge' 可能会有用,但我不太明白。感谢您的帮助!
仅在df2
中从detect
列开始,然后合并:
df1$detect = NULL
df2$detect = 1
result = merge(df1, unique(df2), all.x = TRUE)
这将在完全匹配时将 detect
列创建为 1,在没有完全匹配时将其创建为 NA
。如果需要,可以将 NA
s 更改为 0s。
同样的方法也适用于dplyr
:
library(dplyr)
df1 %>%
select(-detect) %>%
left_join(
df2 %>% mutate(detect = 1) %>% unique)
)
有anti_join
和semi_join
两个表的过滤连接:
library(tidyverse)
df1 <- tribble(
~site, ~ddate, ~ssegment, ~spp, ~vtype, ~tperiod, ~detect,
"BMA", "6/1/2021", 1L, "AMRO", "Song", 1L, 0L,
"BMC", "6/15/2021", 1L, "WISN", "Drum", 1L, 0L,
"BMA", "6/15/2021", 1L, "NOFL", "Song", 2L, 0L,
"BMC", "6/29/2021", 2L, "AMRO", "Call", 1L, 0L,
"BMA", "6/29/2021", 2L, "WISN", "Call", 2L, 0L
)
df2 <- tibble::tribble(
~site, ~ddate, ~ssegment, ~spp, ~vtype, ~tperiod,
"BMA", "6/1/2021", 1L, "AMRO", "Call", 1L,
"BMC", "6/15/2021", 1L, "WISN", "Drum", 1L,
"BMA", "6/15/2021", 1L, "NOFL", "Song", 2L,
"BMC", "6/29/2021", 2L, "AMRO", "Drum", 1L,
"BMA", "6/29/2021", 2L, "WISN", "Call", 2L
)
bind_rows(
df1 %>% select(-detect) %>% anti_join(df2) %>% mutate(detect = 0),
df1 %>% select(-detect) %>% semi_join(df2) %>% mutate(detect = 1)
)
#> Joining, by = c("site", "ddate", "ssegment", "spp", "vtype", "tperiod")
#> Joining, by = c("site", "ddate", "ssegment", "spp", "vtype", "tperiod")
#> # A tibble: 5 x 7
#> site ddate ssegment spp vtype tperiod detect
#> <chr> <chr> <int> <chr> <chr> <int> <dbl>
#> 1 BMA 6/1/2021 1 AMRO Song 1 0
#> 2 BMC 6/29/2021 2 AMRO Call 1 0
#> 3 BMC 6/15/2021 1 WISN Drum 1 1
#> 4 BMA 6/15/2021 1 NOFL Song 2 1
#> 5 BMA 6/29/2021 2 WISN Call 2 1
由 reprex package (v2.0.1)
于 2021-12-08 创建请使用 data.table
库
Reprex
- 代码
library(data.table)
setDT(df1)
setDT(df2)
df1[df2, on = .(site, ddate, ssegment, spp, vtype, tperiod), detect := TRUE][]
- 输出
#> site ddate ssegment spp vtype tperiod detect
#> 1: BMA 6/1/2021 1 AMRO Song 1 0
#> 2: BMC 6/15/2021 1 WISN Drum 1 1
#> 3: BMA 6/15/2021 1 NOFL Song 2 1
#> 4: BMC 6/29/2021 2 AMRO Call 1 0
#> 5: BMA 6/29/2021 2 WISN Call 2 1
由 reprex package (v2.0.1)
于 2021-12-08 创建