如何根据其他两个列之间的字符串(错误)匹配设置列值?

How to set column value based on string (mis)match between two other columns?

我想在数据框中创建一个匹配变量

到目前为止我已经尝试过(str_contains sjmisc 包中的函数):

df$match[(df$str1 == "left"  & str_contains(df$str2, "left"))
                  | (df$str1== "right"  & str_contains(df$str2, "right"))] = 1

df$match[(df$str1== "left"  & str_contains(df$str2, "left", logic = "not")) 
                  | (df$str1== "right"  & str_contains(df$str2, "right", logic = "not"))] = 0

df$match[is.na(df$str1)| is.na(df$str2)] = NA

但只有 NA 部分运行良好,其余部分我得到所有行 = 1,根据数据这是不正确的。

数据示例:

str1 str2 match
left right -
right somewhat left -
left very left -
right right -
right somewhat right -

示例中的匹配应为 0,0,1,1,1,但最终全为 1。我将不胜感激任何建议这里有什么问题或实现我想要的结果的替代方法!

library(tidyverse)

data <- tribble(
  ~str1, ~str2, ~match,
  "left", "right", "-",
  "right", "somewhat left", "-",
  "left", "very left", "-",
  "right", "right", "-",
  "right", "somewhat right", "-",
  NA, NA, "-"
)

data %>%
  mutate(
    match = ifelse(str_detect(str2, str1), 1, 0)
  )
#> # A tibble: 6 × 3
#>   str1  str2           match
#>   <chr> <chr>          <dbl>
#> 1 left  right              0
#> 2 right somewhat left      0
#> 3 left  very left          1
#> 4 right right              1
#> 5 right somewhat right     1
#> 6 <NA>  <NA>              NA

reprex package (v2.0.0)

创建于 2022-05-23

一个base解决方案:

within(df, {
  match <- +mapply(grepl, str1, str2)
})

#    str1           str2 match
# 1  left          right     0
# 2 right  somewhat left     0
# 3  left      very left     1
# 4 right          right     1
# 5 right somewhat right     1
# 6  <NA>           <NA>    NA

数据
df <- structure(list(str1 = c("left", "right", "left", "right", "right", 
NA), str2 = c("right", "somewhat left", "very left", "right", 
"somewhat right", NA)), row.names = c(NA, -6L), class = "data.frame")