如何根据其他两个列之间的字符串(错误)匹配设置列值?
How to set column value based on string (mis)match between two other columns?
我想在数据框中创建一个匹配变量
- 如果另一个变量(字符串)的值包含在第三个变量(字符串)的值中,则为 1
- 如果不是这样则为 0
- 如果任一字符串变量为 NA
,则为 NA
到目前为止我已经尝试过(str_contains sjmisc 包中的函数):
df$match[(df$str1 == "left" & str_contains(df$str2, "left"))
| (df$str1== "right" & str_contains(df$str2, "right"))] = 1
df$match[(df$str1== "left" & str_contains(df$str2, "left", logic = "not"))
| (df$str1== "right" & str_contains(df$str2, "right", logic = "not"))] = 0
df$match[is.na(df$str1)| is.na(df$str2)] = NA
但只有 NA 部分运行良好,其余部分我得到所有行 = 1,根据数据这是不正确的。
数据示例:
str1
str2
match
left
right
-
right
somewhat left
-
left
very left
-
right
right
-
right
somewhat right
-
示例中的匹配应为 0,0,1,1,1,但最终全为 1。我将不胜感激任何建议这里有什么问题或实现我想要的结果的替代方法!
library(tidyverse)
data <- tribble(
~str1, ~str2, ~match,
"left", "right", "-",
"right", "somewhat left", "-",
"left", "very left", "-",
"right", "right", "-",
"right", "somewhat right", "-",
NA, NA, "-"
)
data %>%
mutate(
match = ifelse(str_detect(str2, str1), 1, 0)
)
#> # A tibble: 6 × 3
#> str1 str2 match
#> <chr> <chr> <dbl>
#> 1 left right 0
#> 2 right somewhat left 0
#> 3 left very left 1
#> 4 right right 1
#> 5 right somewhat right 1
#> 6 <NA> <NA> NA
由 reprex package (v2.0.0)
创建于 2022-05-23
一个base
解决方案:
within(df, {
match <- +mapply(grepl, str1, str2)
})
# str1 str2 match
# 1 left right 0
# 2 right somewhat left 0
# 3 left very left 1
# 4 right right 1
# 5 right somewhat right 1
# 6 <NA> <NA> NA
数据
df <- structure(list(str1 = c("left", "right", "left", "right", "right",
NA), str2 = c("right", "somewhat left", "very left", "right",
"somewhat right", NA)), row.names = c(NA, -6L), class = "data.frame")
我想在数据框中创建一个匹配变量
- 如果另一个变量(字符串)的值包含在第三个变量(字符串)的值中,则为 1
- 如果不是这样则为 0
- 如果任一字符串变量为 NA ,则为 NA
到目前为止我已经尝试过(str_contains sjmisc 包中的函数):
df$match[(df$str1 == "left" & str_contains(df$str2, "left"))
| (df$str1== "right" & str_contains(df$str2, "right"))] = 1
df$match[(df$str1== "left" & str_contains(df$str2, "left", logic = "not"))
| (df$str1== "right" & str_contains(df$str2, "right", logic = "not"))] = 0
df$match[is.na(df$str1)| is.na(df$str2)] = NA
但只有 NA 部分运行良好,其余部分我得到所有行 = 1,根据数据这是不正确的。
数据示例:
str1 | str2 | match |
---|---|---|
left | right | - |
right | somewhat left | - |
left | very left | - |
right | right | - |
right | somewhat right | - |
示例中的匹配应为 0,0,1,1,1,但最终全为 1。我将不胜感激任何建议这里有什么问题或实现我想要的结果的替代方法!
library(tidyverse)
data <- tribble(
~str1, ~str2, ~match,
"left", "right", "-",
"right", "somewhat left", "-",
"left", "very left", "-",
"right", "right", "-",
"right", "somewhat right", "-",
NA, NA, "-"
)
data %>%
mutate(
match = ifelse(str_detect(str2, str1), 1, 0)
)
#> # A tibble: 6 × 3
#> str1 str2 match
#> <chr> <chr> <dbl>
#> 1 left right 0
#> 2 right somewhat left 0
#> 3 left very left 1
#> 4 right right 1
#> 5 right somewhat right 1
#> 6 <NA> <NA> NA
由 reprex package (v2.0.0)
创建于 2022-05-23一个base
解决方案:
within(df, {
match <- +mapply(grepl, str1, str2)
})
# str1 str2 match
# 1 left right 0
# 2 right somewhat left 0
# 3 left very left 1
# 4 right right 1
# 5 right somewhat right 1
# 6 <NA> <NA> NA
数据
df <- structure(list(str1 = c("left", "right", "left", "right", "right",
NA), str2 = c("right", "somewhat left", "very left", "right",
"somewhat right", NA)), row.names = c(NA, -6L), class = "data.frame")