CSS link 的选择器到下一页 returns Scrapy 中的空列表 shell

Question

我是 Scrapy 的新手。我尝试从该站点 link 转到下一页 https://book24.ru/knigi-bestsellery/?section_id=1592

html 的样子：enter image description here

在 scrapy shell 我写了这个命令：

response.css('li.pagination__button-item._next a::attr(href)')

它 returns 一个空列表。

我也试过了

response.css('a.pagination__item._link._button._next.smartLink')

但它也是 returns 一个空列表。

我将不胜感激！

Answer 1

页面是用 JavaScript 生成的，用 'view(response)' 查看它的外观。

# with css:
In [1]: response.css('head > link:nth-child(28) ::attr(href)').get()                                                   
Out[1]: 'https://book24.ru/knigi-bestsellery/page-2/'

# with xpath:
In [2]: response.xpath('//link[@rel="next"]/@href').get()
Out[2]: 'https://book24.ru/knigi-bestsellery/page-2/'

Answer 2

我想添加到 @SuperUser's answer. Seeing as the site loads the HTML via JavaScript, please read the documentation on how to handle JavaScript websites. scrapy-playwright 是一个最近的库，我发现它在抓取 JS 呈现的网站时非常快速且易于使用。

CSS link 的选择器到下一页 returns Scrapy 中的空列表 shell

CSS selector of link to the next page returns empty list in Scrapy shell

css

href

scrapy

scrapy-shell