只有包含关键字的 grep img 标签，而不是不包含关键字的 img 标签？

Question

使用 grep/regex，我试图从文件中提取 img 标签。我只想要源中包含'photobucket'的img标签，不想要不包含photobucket.

的img标签

想要：

<img src="/photobucket/img21.png">

不想要：

<img src="/imgs/test.jpg">
<img src="/imgs/thiswillgetpulledtoo.jpg"><p>We like photobucket</p>

我尝试过的：

(<img.*?photobucket.*?>)

这没有用，因为它提取了 "Do Not Want" 中的第二个示例，因为有一个 'photobucket'，然后是右括号。我怎样才能只检查 'photobucket' 直到第一个右括号，如果不包含 photobucket，忽略它并继续？

'photobucket' 可能位于字符串中的不同位置。

Answer 1

尝试以下操作：

<img[^>]*?photobucket[^>]*?>

这样正则表达式就无法通过'>'

Answer 2

试试这个模式：

<img.*src=\"[/a-zA-Z0-9_]+photobucket[/a-zA-Z0-9_]+\.\w+\".*>

我不确定名称文件夹允许的字符，但您只需在 "photobucket".

前后添加“[]”范围即可

Answer 3

只需添加 > 符号的否定：

(<img[^>]*?photobucket.*?>)

https://regex101.com/r/tZ9lI9/2

Answer 4

grep -o '<img[^>]*src="[^"]*photobucket[^>]*>' infile

-o returns 仅匹配项。拆分：

<img          # Start with <img
[^>]*         # Zero or more of "not >"
src="         # start of src attribute
[^"]*         # Zero or more or "not quotes"
photobucket   # Match photobucket
[^>]*         # Zero or more of "not >"
>             # Closing angle bracket

对于输入文件

<img src="/imgs/test.jpg">
<img src="/imgs/thiswillgetpulledtoo.jpg"><p>We like photobucket</p>
<img src="/photobucket/img21.png">
<img alt="photobucket" src="/something/img21.png">
<img alt="something" src="/photobucket/img21.png">
<img src="/photobucket/img21.png" alt="something">
<img src="/something/img21.png" alt="photobucket">

这个returns

$ grep -o '<img[^>]*src="[^"]*photobucket[^>]*>' infile
<img src="/photobucket/img21.png">
<img alt="something" src="/photobucket/img21.png">
<img src="/photobucket/img21.png" alt="something">

non-greedy .*? 仅适用于 -P 选项（Perl 正则表达式）。

只有包含关键字的 grep img 标签，而不是不包含关键字的 img 标签？

Only grep img tags that contain a keyword, but not img tags that don't?

regex

grep

bbedit