如何使用 Nokogiri 获取列表中的第二张图片？

Question

我正在使用 Nokogiri 来抓取网站，但我坚持使用它。我想 select 每个图像在 第二里 的 div。我无法得到正确的公式。请问有什么帮助吗？

   <div class="carousel">
            <ul>
                <li>
                   <img alt="Columna de almacenaje " data-bigsrc="/images/cache//c/8/-c87cd5d9a33b662e766480cebec41ffc_w500_h500.jpg" data-hdsrc="http://cdn.maisonsdumonde.com//images/produits/ES/es/taille_hd/130103_1.jpg" src="http://cdn.maisonsdumonde.com//images/cache//c/8/-c87cd5d9a33b662e766480cebec41ffc_w66_h66.jpg" height="66" width="66">                       
                </li>
                <li>
                   <img alt="Columna de almacenaje " data-bigsrc="/images/cache//2/0/-204c84cf02f6b73d289c2e887b7251ce_w500_h500.jpg" data-hdsrc="http://cdn.maisonsdumonde.com//images/produits/ES/es/taille_hd/130103_2.jpg" src="http://cdn.maisonsdumonde.com//images/cache//2/0/-204c84cf02f6b73d289c2e887b7251ce_w66_h66.jpg" height="66" width="66">                        
                </li>
                <li>
                   <img alt="Columna de almacenaje " data-bigsrc="/images/cache//a/e/-aeda035baaaad22cb12e1c074d124ece_w500_h500.jpg" data-hdsrc="http://cdn.maisonsdumonde.com//images/produits/ES/es/taille_hd/130103_3.jpg" src="http://cdn.maisonsdumonde.com//images/cache//a/e/-aeda035baaaad22cb12e1c074d124ece_w66_h66.jpg" height="66" width="66">                        
                </li>
                <li>
                   <img alt="Columna de almacenaje " data-bigsrc="/images/cache//c/f/-cf424c392b338c6d39e525d6396566df_w500_h500.jpg" data-hdsrc="http://cdn.maisonsdumonde.com//images/produits/ES/es/taille_hd/130103_4.jpg" src="http://cdn.maisonsdumonde.com//images/cache//c/f/-cf424c392b338c6d39e525d6396566df_w66_h66.jpg" height="66" width="66">                        
                </li>
             </ul>
        </div>

这是我的提取器：

image = agent.get(doc.parser.at('ul.carousel:nth-child(2) img')['data-hdsrc']).save

Answer 1

使用 xpath

agent.get(doc.parser.at("//div[@class='carousel']/ul/li[position()=2]/img")['data-hdsrc']).save

使用CSS

agent.get(doc.parser.at(".carousel > ul > li:nth-of-type(2) > img")['data-hdsrc']).save

Answer 2

尝试：

response =  Nokogiri::HTML(agent.get('www.site_url.com'))
image = response.css('.carousel ul li:nth-of-type(2) img')[0]
puts image['data-hdsrc'] # Print the attribute value 
image['data-hdsrc'].save # or image['src'], image[data-bigsrc]

Answer 3

提供的两个答案都有效。问题是某些 URLS 中的某些图像为 nil，因此我必须在继续之前验证每张图像：

  if web.at_css('.carousel ul li:nth-child(3) img')
    image3 = agent.get(doc.parser.at(".carousel  ul li:nth-child(3) img")['data-hdsrc']).save("maisonsdumonde/f_deco-#{counter}.jpg")
  else
    puts "No third image"
  end

如何使用 Nokogiri 获取列表中的第二张图片？

How to use Nokogiri to get second image on a list?

css

ruby-on-rails

nokogiri

使用 xpath

使用CSS