如何计算 R 中字符串中特定模式的数量?

How to count number of particular pattern in a string in R?

我正在尝试计算字符串中 | 的数量。这是我的代码,但它给出的错误答案是 32 而不是 2?为什么会发生这种情况,如何获得 returns 2 的函数?谢谢!

> levels
[1] "Completely|Partially|Not at all"
> str_count(levels, '|')
[1] 32

另外,如何用 | 字符分隔字符串?我希望输出是长度为 3 的字符向量:'Completely'、'Partially'、'Not at all'.

| 在正则表达式中作为类似“或”的运算符是有意义的。用反斜杠转义。

stringr::str_count("Completely|Partially|Not at all", "\|")
# [1] 2

为了说明 | 通常用于什么,让我们计算一下 elal 的出现次数:

stringr::str_count("Completely|Partially|Not at all", "al")
# [1] 2
stringr::str_count("Completely|Partially|Not at all", "el")
# [1] 1
stringr::str_count("Completely|Partially|Not at all", "el|al")
# [1] 3

要查找文字 | 符号,需要对其进行转义。

split|符号组成的str,我们可以使用strsplit(基数R)或stringr::str_split:

strsplit("Completely|Partially|Not at all", "\|")
# [[1]]
# [1] "Completely" "Partially"  "Not at all"

它以列表形式返回,因为参数可能是向量。例如,如果我们这样做可能会更清楚

vec <- c("Completely|Partially|Not at all", "something|else")
strsplit(vec, "\|")
# [[1]]
# [1] "Completely" "Partially"  "Not at all"
# [[2]]
# [1] "something" "else"     

竖线 | 字符是正则表达式元字符,需要转义:

levels <- "Completely|Partially|Not at all"
str_count(levels, '\|')

您可以在此处使用的另一个通用技巧是将输入的长度与剥离所有管道的长度进行比较:

nchar(levels) - nchar(gsub("|", "", levels, fixed=TRUE))
[1] 2

附录:使用strsplit

unlist(strsplit(levels, "\|"))

[1] "Completely" "Partially"  "Not at all"