使用 scrapy 提取具有特定 css class 的链接

Extracting links with scrapy that have a specific css class

python
screen-scraping
scrapy
web-scraping
scrapy-spider

概念简单question/idea。

使用 Scrapy，如何使用 LinkExtractor 提取仅跟随具有给定 CSS 的链接？

看起来微不足道，应该已经内置了，但我没看到？是吗？

看起来我可以使用 XPath，但我更喜欢使用 CSS 选择器。好像不支持？

我是否必须编写自定义 LinkExtractor 才能使用 CSS 选择器？

据我了解，您想要类似于 restrict_xpaths 的内容，但提供 CSS 选择器而不是 XPath 表达式。

这其实是Scrapy 1.0中的一个内置特性（目前处于release candidate状态），参数叫做restrict_css:

restrict_css

a CSS selector (or list of selectors) which defines regions inside the response where links should be extracted from. Has the same behaviour as restrict_xpaths.

初始功能请求：

CSS support in link extractors

使用 scrapy 提取具有特定 css class 的链接

Extracting links with scrapy that have a specific css class

python

screen-scraping

scrapy

web-scraping

scrapy-spider