如何检查 URL 中的文字标志?

How to check for the word logo in URLs?

我想从 HTML 字符串中 select 徽标图像 url。我假设徽标图像 URL 将在其 URL 的某处包含文本 'logo'。

需要一个正则表达式,selects 图片 URLs 来自给定的 HTML 字符串文本。徽标 URL 的路径中将包含文本 'logo'。

/(https?:\/\/(?:www\.)?[\w+-_.0-9@\/]+logo.(?:png|jpg|jpeg))/i
["https://static.infragistics.com/marketing/Website/home/espn-logo.png", "https://static.infragistics.com/marketing/Website/home/mondelez-logo.png", "https://static.infragistics.com/marketing/Website/home/nielsen-logo.png", "https://static.infragistics.com/marketing/Website/home/united-logo.png", "https://static.infragistics.com/marketing/Website/home/merrill-lynch-logo.png", "https://static.infragistics.com/marketing/Website/home/dell-logo.png", "https://static.infragistics.com/marketing/Website/home/intel-logo.png", "https://static.infragistics.com/marketing/Website/home/prudential-logo.png", "https://static.infragistics.com/marketing/Website/home/mcdonalds-logo.png"]

文字标识可以出现在URL的任何地方。

需要一个正则表达式来选择其中包含文本 'logo' 的图像 URls。

也许,减少我们在表达式中的约束是个好主意,甚至 logo 的词边界可能没有,某些表达式类似于:

(?i)^(?=.*logo)(?:https)?:\/\/\S+(?:png|jpe?g|gif|tiff)$

其中,

(?=.*logo)

会简单地检查 URL 中是否有 logo


如果我们只想检查图片名称中的单词 logo

espn-logo.png
espn-logos.png

我们会在最后一个斜线之后向前移动我们的积极展望,例如:

(?i)^(?:https)?:\/\/\S+\/(?=.*logo).*(?:png|jpe?g|gif|tiff)$

我们的 desired image extensions 将进入这个非捕获组:

(?:png|jpe?g|gif|tiff|svg)

测试

re = /(?i)^(?=.*logo)(?:https)?:\/\/\S+(?:png|jpe?g|gif|tiff)$/s
str = 'https://static.infragistics.com/marketing/Website/home/espn-logo.png
https://static.infragistics.com/marketing/Website/home/mondelez-logo.gif
https://static.infragistics.com/marketing/Website/home/nielsen-logo.jpg
https://static.infragistics.com/marketing/Website/home/united-logo.jpeg
https://static.infragistics.com/marketing/Website/home/merrill-lynch-logo.PNG
https://static.infragistics.com/marketing/Website/home/dell-logo.TIFF
https://static.infragistics.com/marketing/Website/home/intel-logo.gif
https://static.infragistics.com/marketing/Website/home/prudential-logo.png
https://static.infragistics.com/marketing/Website/home/mcdonalds-logo.GIF
https://static.infragistics.com/marketing/Website/home/mcdonalds-alogo.GIF
https://static.infragistics.com/marketing/Website/home/mcdonalds-logos.GIF'

str.scan(re) do |match|
    puts match.to_s
end

表达式在 regex101.com, if you wish to explore/simplify/modify it, and in this link 的右上面板进行了解释,如果您愿意,可以观察它如何与一些示例输入匹配。


正则表达式电路

jex.im 可视化正则表达式:

编辑

对于那些我们有其他 URL 实例的情况,我们通常会为边缘情况添加更多约束,例如:

(?i)(?<=")\s*(?:https?)?:\/\/[^"]+\/(?=[^"]*logo)[^"]*(?:png|jpe?g|gif|tiff)\s*(?=")

DEMO


我想也许先收集图像 URL 会更容易,然后我们会检查图像名称中是否有 \blogo\b。否则,表达式可能会变得复杂。