正则表达式模式以随机 \n 或 \n\n 作为换行符计算诗歌中的行数
Regex pattern to count lines in poems with randomly \n or \n\n as line breaks
我需要数221首诗的行数并尝试数换行符\n。
但是,有些行有双换行符 \n\n 以构成新的诗句。这些我只想算作一个。每首诗中双换行的数量和位置是随机的。
最小工作示例:
library("quanteda")
poem1 <- "This is a line\nThis is a line\n\nAnother line\n\nAnd another one\nThis is the last one"
poem2 <- "Some poetry\n\nMore poetic stuff\nAnother very poetic line\n\nThis is the last line of the poem"
poems <- quanteda::corpus(poem1, poem2)
生成的行数对于 poem1
应为 5 行,对于 poem2
应为 4 行。
我尝试了 stringi::stri_count_fixed(texts(poems), pattern = "\n")
,但正则表达式模式不够精细,无法解决随机双换行问题。
您可以将 stringr::str_count
与 \R+
模式结合使用来查找字符串中 个连续换行序列 的数目:
> poem1 <- "This is a line\nThis is a line\n\nAnother line\n\nAnd another one\nThis is the last one"
> poem2 <- "Some poetry\n\nMore poetic stuff\nAnother very poetic line\n\nThis is the last line of the poem"
> library(stringr)
> str_count(poem1, "\R+")
[1] 4
> str_count(poem2, "\R+")
[1] 3
所以行数是str_count(x, "\R+") + 1
。
\R
模式匹配任何换行符序列,CRLF、LF 或 CR。 \R+
匹配一个或多个这样的换行符序列。
poem1 <- "This is a line\nThis is a line\n\nAnother line\n\nAnd another one\nThis is the last one"
poem2 <- "Some poetry\n\nMore poetic stuff\nAnother very poetic line\n\nThis is the last line of the poem"
library(stringr)
str_count(poem1, "\R+")
# => [1] 4
str_count(poem2, "\R+")
# => [1] 3
## Line counts:
str_count(poem1, "\R+") + 1
# => [1] 5
str_count(poem2, "\R+") + 1
# => [1] 4
我需要数221首诗的行数并尝试数换行符\n。
但是,有些行有双换行符 \n\n 以构成新的诗句。这些我只想算作一个。每首诗中双换行的数量和位置是随机的。
最小工作示例:
library("quanteda")
poem1 <- "This is a line\nThis is a line\n\nAnother line\n\nAnd another one\nThis is the last one"
poem2 <- "Some poetry\n\nMore poetic stuff\nAnother very poetic line\n\nThis is the last line of the poem"
poems <- quanteda::corpus(poem1, poem2)
生成的行数对于 poem1
应为 5 行,对于 poem2
应为 4 行。
我尝试了 stringi::stri_count_fixed(texts(poems), pattern = "\n")
,但正则表达式模式不够精细,无法解决随机双换行问题。
您可以将 stringr::str_count
与 \R+
模式结合使用来查找字符串中 个连续换行序列 的数目:
> poem1 <- "This is a line\nThis is a line\n\nAnother line\n\nAnd another one\nThis is the last one"
> poem2 <- "Some poetry\n\nMore poetic stuff\nAnother very poetic line\n\nThis is the last line of the poem"
> library(stringr)
> str_count(poem1, "\R+")
[1] 4
> str_count(poem2, "\R+")
[1] 3
所以行数是str_count(x, "\R+") + 1
。
\R
模式匹配任何换行符序列,CRLF、LF 或 CR。 \R+
匹配一个或多个这样的换行符序列。
poem1 <- "This is a line\nThis is a line\n\nAnother line\n\nAnd another one\nThis is the last one"
poem2 <- "Some poetry\n\nMore poetic stuff\nAnother very poetic line\n\nThis is the last line of the poem"
library(stringr)
str_count(poem1, "\R+")
# => [1] 4
str_count(poem2, "\R+")
# => [1] 3
## Line counts:
str_count(poem1, "\R+") + 1
# => [1] 5
str_count(poem2, "\R+") + 1
# => [1] 4