Trim \n\n\n\n 之间文本中的模式
Trim pattern in a text between \n\n\n\n
我正在清理 R 中的文本。我的文本格式为
but he could not avoid the subject FULLSTOP \n\n\n\n\nsimilar pieces
by the author\n\n\nlife is great 13022015\nreal men don t eath quiche
22042013\nback to the future 01072012\n\n\n\n and as he takes the
stage here wednesday night to rally democrats around hillary clinton
mr FULLSTOP obama will revisit his own promise to guide the nation
into an era of reconciliation and unity harking back to the themes
that propelled his improbable rise but that seem even more out of
reach today FULLSTOP \n\n\n\n\nobama at convention to lay out stakes for
a divided nation \n\n\n\n we get frustrated with political gridlock worry
about racial divisions are shocked and saddened by the madness of
orlando or nice mr FULLSTOP
我正在尝试摆脱
\n\n\n\n\nsimilar pieces by the author\n\n\nlife is great 13022015\nreal men don t eath quiche 22042013\nback to the future 01072012\n\n\n\n
所以要获得类似
的东西
but he could not avoid the subject FULLSTOP and as he takes the stage
here wednesday night to rally democrats around hillary clinton mr
FULLSTOP obama will revisit his own promise to guide the nation into
an era of reconciliation and unity harking back to the themes that
propelled his improbable rise but that seem even more out of reach
today FULLSTOP \n\n\n\n\nobama at convention to lay out stakes for a
divided nation \n\n\n\n we get frustrated with political gridlock
worry about racial divisions are shocked and saddened by the madness
of orlando or nice mr FULLSTOP
我正在尝试
gsub("\\n{3,}(similar pieces)?.*\\n{3,}", "", my_string)
或gsub("\\n{3,}(similar pieces)?.*?\\n{3,}", "", my_string)
但它过度修剪或不起作用。
任何帮助(以及对我做错了什么的解释以及替代方案为何有效)将不胜感激。
您需要匹配前 5 个换行符到前 4 个换行符之间的所有内容。
我建议使用 *\n{5}.*?\n{4} *
正则表达式:
*
- 零个或多个文字 spaces
\n{5}
- 5 个换行符
.*?
- 第一个字符之前的零个或多个字符....
\n{4}
- 4 个 LF 符号
*
- 零个或多个文字 spaces(只是为了 trim 匹配)
并替换为 space。
使用 sub
因为您只需要 1 个替换:
sub(" *\n{5}.*?\n{4} *", " ", s)
我正在清理 R 中的文本。我的文本格式为
but he could not avoid the subject FULLSTOP \n\n\n\n\nsimilar pieces by the author\n\n\nlife is great 13022015\nreal men don t eath quiche 22042013\nback to the future 01072012\n\n\n\n and as he takes the stage here wednesday night to rally democrats around hillary clinton mr FULLSTOP obama will revisit his own promise to guide the nation into an era of reconciliation and unity harking back to the themes that propelled his improbable rise but that seem even more out of reach today FULLSTOP \n\n\n\n\nobama at convention to lay out stakes for a divided nation \n\n\n\n we get frustrated with political gridlock worry about racial divisions are shocked and saddened by the madness of orlando or nice mr FULLSTOP
我正在尝试摆脱
\n\n\n\n\nsimilar pieces by the author\n\n\nlife is great 13022015\nreal men don t eath quiche 22042013\nback to the future 01072012\n\n\n\n
所以要获得类似
的东西but he could not avoid the subject FULLSTOP and as he takes the stage here wednesday night to rally democrats around hillary clinton mr FULLSTOP obama will revisit his own promise to guide the nation into an era of reconciliation and unity harking back to the themes that propelled his improbable rise but that seem even more out of reach today FULLSTOP \n\n\n\n\nobama at convention to lay out stakes for a divided nation \n\n\n\n we get frustrated with political gridlock worry about racial divisions are shocked and saddened by the madness of orlando or nice mr FULLSTOP
我正在尝试
gsub("\\n{3,}(similar pieces)?.*\\n{3,}", "", my_string)
或gsub("\\n{3,}(similar pieces)?.*?\\n{3,}", "", my_string)
但它过度修剪或不起作用。
任何帮助(以及对我做错了什么的解释以及替代方案为何有效)将不胜感激。
您需要匹配前 5 个换行符到前 4 个换行符之间的所有内容。
我建议使用 *\n{5}.*?\n{4} *
正则表达式:
*
- 零个或多个文字 spaces\n{5}
- 5 个换行符.*?
- 第一个字符之前的零个或多个字符....\n{4}
- 4 个 LF 符号*
- 零个或多个文字 spaces(只是为了 trim 匹配)
并替换为 space。
使用 sub
因为您只需要 1 个替换:
sub(" *\n{5}.*?\n{4} *", " ", s)