R：如何在没有正则表达式的情况下使用 str_replace_all( )

Question

我有一些包含“[姓氏]”、“[女名]”和“[男名]”的文本数据。例如，

c("I am [female name]. I am ten years old", "My father is [male name][surname]", "I went to school today")

希望删除分析，期待得到

"I am . I am ten years old", "My father is ", "I went to school today"

但是当我运行下面的代码时，它 returns 被破坏了。我认为 str_replace_all 可能会将 [ ] 的模式识别为正则表达式，但我不完全确定为什么。

> str_replace_all(c("I am [female name]. I am ten years old", "My father is [male name][surname]", "I went to school today") , "[surname]", '')

[1] "I  [fl ]. I  t y old" "My fth i [l ][]"      "I wt to chool tody"

有谁知道怎么解决吗？提前谢谢你

Answer 1

使用stringi::str_replace_all:

library(stringi)
data <- c("I am [female name]. I am ten years old", "My father is [male name][surname]", "I went to school today") 
remove_us <- c("[female name]","[male name]","[surname]")
stri_replace_all_fixed(data, remove_us, "", vectorize_all=FALSE)

结果

[1] "I am . I am ten years old" "My father is  "            "I went to school today"

参见R proof。

不过，gsub更简单：

gsub('\[[^][]*]', '', data)

见another R proof。

--------------------------------------------------------------------------------
  \[                       '['
--------------------------------------------------------------------------------
  [^][]*                   any character except: ']', '[' (0 or more
                           times (matching the most amount possible))
--------------------------------------------------------------------------------
  ]                        ']'

R：如何在没有正则表达式的情况下使用 str_replace_all( )

R: how to use str_replace_all( ) without regular expression

text-processing

r