用开始和结束字符串分隔字符串

Separate string by start and finish strings

在 R 中我有一系列字符串,例如:

"New:\r\nRemote_UI: Apple CarPlay application cannot be started (P3_DA18018395_012) (91735)\r\nMedia: After an iPhone is authorised as BTA device for the first time, Entertainment volume is abruptly set to zero when the user picks a song from "当前曲目列表" (DA18018395_015)\r\n\r\nKnown:\r\nHWR 导航条目中未读出 (89412)"

我想要这样的东西:

New:
[1] Remote_UI: Apple CarPlay application cannot be started (P3_DA18018395_012) (91735)
[2] Media: After an iPhone is authorised as BTA device for the first time, Entertainment volume is abruptly set to zero when the user picks a song from "Current tracklist" (DA18018395_015)

Known:
[1] HWR in navigation entry is not read out (89412)

请注意,可能只有 "New"、"Known"、none 或两者的顺序不同。有任何想法吗?谢谢!

您可以使用

x <- "New:\r\nRemote_UI: Apple CarPlay application cannot be started (P3_DA18018395_012) (91735)\r\nMedia: After an iPhone is authorised as BTA device for the first time, Entertainment volume is abruptly set to zero when the user picks a song from \"Current tracklist\" (DA18018395_015)\r\n\r\nKnown:\r\nHWR in navigation entry is not read out (89412)"
New <- regmatches(x, gregexpr("(?:\G(?!\A)\R+|New:\R+)\K.+(?!\R+\w+:\R)", x, perl=TRUE))
Known <- regmatches(x, gregexpr("(?:\G(?!\A)\R+|Known:\R+)\K.+(?!\R+\w+:\R)", x, perl=TRUE))

参见R demo online

输出:

[[1]]
[1] "Remote_UI: Apple CarPlay application cannot be started (P3_DA18018395_012) (91735)\r"                                                                                                     
[2] "Media: After an iPhone is authorised as BTA device for the first time, Entertainment volume is abruptly set to zero when the user picks a song from \"Current tracklist\" (DA18018395_015"

[[1]]
[1] "HWR in navigation entry is not read out (89412)"

使用的正则表达式是

(?:\G(?!\A)\R+|New:\R+)\K.+(?!\R+\w+:\R)

regex demo online。第二个正则表达式与这个正则表达式的不同之处仅在于文字 Known.

详情

  • (?:\G(?!\A)\R+|New:\R+) - 上一场比赛结束和 1+ 换行符 (\G(?!\A)\R+) 或 (|) New: 然后 1 或更多换行符 ( \R+)
  • \K - 匹配重置运算符丢弃到目前为止匹配的整个文本
  • .+ - 1+ 个字符,除换行字符外,尽可能多
  • (?!\R+\w+:\R) - 如果在当前位置的右侧紧邻有:
    • \R+ - 1+ 换行符,
    • \w+ - 1+ 个单词字符
    • : - 冒号
    • \R - 一个换行符。