`str_detect()` 和 `map()` 遍历许多字符串检测

Question

我的数据格式如下。（最后输入数据的代码，问题下方）。

#> df
#>  id amount description
#>   1     10 electricity
#>   2    100        rent
#>   3      4        fees

我希望能够根据描述中是否包含某些字符串对交易（行）进行分类。

例如：

library(tidyverse)
df <- df %>% 
  mutate(category = ifelse(str_detect(description, "elec"), "bills", description))

给出：

#>   id amount description category
#> 1  1     10 electricity    bills
#> 2  2    100        rent         
#> 3  3      4        fees

我希望能够定义关键字向量和相关类别，如下所示：

keywords <- c(electric = "bills",
              rent = "bills",
              fees = "misc")

下一步是什么才能创建具有正确标签的类别列？

期望的输出：


#>   id amount description category
#> 1  1     10 electricity    bills
#> 2  2    100        rent    bills         
#> 3  3      4        fees    misc

我试过 map2_df，但我一定是做错了什么，因为下面的代码创建了三个版本的 df 堆叠在一起：

categorise_transactions <- function(keyword, category){df <- df %>% 
  mutate(category = ifelse(str_detect(description, keyword), category, description))}

library(purrr)
map2_df(names(keywords), keywords, categorise_transactions)

以下数据输入代码：

df <- data.frame(
  stringsAsFactors = FALSE,
                id = c(1L, 2L, 3L),
            amount = c(10L, 100L, 4L),
       description = c("electricity", "rent", "fees")
)
df

Answer 1

str_replace_all 几乎给了你所需要的:

library(dplyr)
library(stringr)

str_replace_all(df$description, keywords)
#[1] "billsity" "bills"    "misc"

但是，正如@Russ Thomas 所建议的，case_when 提供了您所需要的。

library(dplyr)
library(stringr)

df %>%
  mutate(category = case_when(str_detect(description, 'electric') ~ 'bills', 
                              str_detect(description, 'rent') ~ 'bills', 
                              str_detect(description, 'fees') ~ 'misc'))  


#  id amount description category
#1  1     10 electricity    bills
#2  2    100        rent    bills
#3  3      4        fees     misc

`str_detect()` 和 `map()` 遍历许多字符串检测

`str_detect()` and `map()` to iterate through many string detections

r

dplyr

purrr

tidyverse

tidytext