R:如何在没有正则表达式的情况下使用 str_replace_all( )
R: how to use str_replace_all( ) without regular expression
我有一些包含“[姓氏]”、“[女名]”和“[男名]”的文本数据。例如,
c("I am [female name]. I am ten years old", "My father is [male name][surname]", "I went to school today")
希望删除分析,期待得到
"I am . I am ten years old", "My father is ", "I went to school today"
但是当我 运行 下面的代码时,它 returns 被破坏了。我认为 str_replace_all 可能会将 [ ] 的模式识别为正则表达式,但我不完全确定为什么。
> str_replace_all(c("I am [female name]. I am ten years old", "My father is [male name][surname]", "I went to school today") , "[surname]", '')
[1] "I [fl ]. I t y old" "My fth i [l ][]" "I wt to chool tody"
有谁知道怎么解决吗?
提前谢谢你
使用stringi::str_replace_all
:
library(stringi)
data <- c("I am [female name]. I am ten years old", "My father is [male name][surname]", "I went to school today")
remove_us <- c("[female name]","[male name]","[surname]")
stri_replace_all_fixed(data, remove_us, "", vectorize_all=FALSE)
结果
[1] "I am . I am ten years old" "My father is " "I went to school today"
参见R proof。
不过,gsub
更简单:
gsub('\[[^][]*]', '', data)
--------------------------------------------------------------------------------
\[ '['
--------------------------------------------------------------------------------
[^][]* any character except: ']', '[' (0 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
] ']'
我有一些包含“[姓氏]”、“[女名]”和“[男名]”的文本数据。例如,
c("I am [female name]. I am ten years old", "My father is [male name][surname]", "I went to school today")
希望删除分析,期待得到
"I am . I am ten years old", "My father is ", "I went to school today"
但是当我 运行 下面的代码时,它 returns 被破坏了。我认为 str_replace_all 可能会将 [ ] 的模式识别为正则表达式,但我不完全确定为什么。
> str_replace_all(c("I am [female name]. I am ten years old", "My father is [male name][surname]", "I went to school today") , "[surname]", '')
[1] "I [fl ]. I t y old" "My fth i [l ][]" "I wt to chool tody"
有谁知道怎么解决吗? 提前谢谢你
使用stringi::str_replace_all
:
library(stringi)
data <- c("I am [female name]. I am ten years old", "My father is [male name][surname]", "I went to school today")
remove_us <- c("[female name]","[male name]","[surname]")
stri_replace_all_fixed(data, remove_us, "", vectorize_all=FALSE)
结果
[1] "I am . I am ten years old" "My father is " "I went to school today"
参见R proof。
不过,gsub
更简单:
gsub('\[[^][]*]', '', data)
--------------------------------------------------------------------------------
\[ '['
--------------------------------------------------------------------------------
[^][]* any character except: ']', '[' (0 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
] ']'