如何使用 Grep 查找以正则表达式结尾的短语?

How do I use Grep to find a phrase that ends with a regex expression?

我有一个很大的文本文件,我正试图将其分离成一个 CSV 文件。现在没有换行符,但我要分隔的每一行都将以正则表达式 url is \S+.

结尾

我正在使用 bbedit 查找并希望提取这些行。我最初尝试在发现正则表达式后换行,但如果我将 url is \S+\n 放入替换部分,它会按字面意思和我的 url 不见了。 我试过的一些表达方式:

\burl is \S+
\b.*url is \S+ 
$url is \S+ 
.*$url is \S+ 
url is \S+ $
url is \S+$

每一行的语法是

<message>, post has <#> likes, profile is <name>, url is <characters> 

所以文档的一个例子是:

message 1 here, post has 37 likes, profile is name1, url is 8gjEobL1U4 message 2, some messages have commas in them, post has 182 likes, profile is name2, url is 89PI4JOscv here is another message, post has 105 likes, profile is someoneelse, url is 89baAOzDLj

使用 GNU grep:

grep -oP '.*? url is [^ ]+ *' file

输出:

message 1 here, post has 37 likes, profile is name1, url is 8gjEobL1U4 
message 2, some messages have commas in them, post has 182 likes, profile is name2, url is 89PI4JOscv 
here is another message, post has 105 likes, profile is someoneelse, url is 89baAOzDLj

在 Notepad++ 中我可以使用:

  • 查找内容:url is.+? <- 问号后有一个space
  • 替换为:[=11=]\n

我假设您希望在实际 URL 之后拆分,而不是在单词 "url"

之后拆分