如何使用 mechanize gem 抓取图像的图标 link
How to scrape icon link of image using mechanize gem
我有一个 url,我必须使用 抓取 所有图像 mechanize
gem,但一些图像 url在 rel=icon
.
我必须从中获取图像 url:
<link rel="icon" href="https://mywebsite.com/wp-content/uploads/2021/10/cropped-favicon-32x32.png" sizes="32x32">
这是我试过的代码,只抓取 图像。如何让两者合二为一。
require 'mechanize'
url = "https://mywebsite.com/"
agent = Mechanize.new
page = agent.get(url)
page.images.each do |image|
puts image #getting here all images here from image tag
end
我查看了 Mechanize Page Link,但 returns 只有 anchors
。
用 xpath
试过了
page.xpath('//link[contains(@rel, "icon")]').each do |icon|
p icon.attr('href')
end
并收到:
"https://ownwebsite.com/wp-content/uploads/2021/10/cropped-favicon-32x32.png"
"https://ownwebsite.com/wp-content/uploads/2021/10/cropped-favicon-192x192.png"
"https://ownwebsite.com/wp-content/uploads/2021/10/cropped-favicon-180x180.png"
这里是一张Replit,returns所有的图片。
page.search('link').each do |link|
if link['href'].to_s.include?(".gif") or link['href'].to_s.include?(".png") or link['href'].to_s.include?(".jpg") or link['href'].to_s.include?(".jpeg")
puts link['href']
end
end
我有一个 url,我必须使用 抓取 所有图像 mechanize
gem,但一些图像 url在 rel=icon
.
我必须从中获取图像 url:
<link rel="icon" href="https://mywebsite.com/wp-content/uploads/2021/10/cropped-favicon-32x32.png" sizes="32x32">
这是我试过的代码,只抓取 图像。如何让两者合二为一。
require 'mechanize'
url = "https://mywebsite.com/"
agent = Mechanize.new
page = agent.get(url)
page.images.each do |image|
puts image #getting here all images here from image tag
end
我查看了 Mechanize Page Link,但 returns 只有 anchors
。
用 xpath
page.xpath('//link[contains(@rel, "icon")]').each do |icon|
p icon.attr('href')
end
并收到:
"https://ownwebsite.com/wp-content/uploads/2021/10/cropped-favicon-32x32.png"
"https://ownwebsite.com/wp-content/uploads/2021/10/cropped-favicon-192x192.png"
"https://ownwebsite.com/wp-content/uploads/2021/10/cropped-favicon-180x180.png"
这里是一张Replit,returns所有的图片。
page.search('link').each do |link|
if link['href'].to_s.include?(".gif") or link['href'].to_s.include?(".png") or link['href'].to_s.include?(".jpg") or link['href'].to_s.include?(".jpeg")
puts link['href']
end
end