Scrapy，从 H3 标签中获取 href？

Question

目前正在尝试从 HTML 的以下片段中抓取 link 和标题，尽管阅读了 scrapy 文档一段时间，但似乎找不到任何方法。

<h3 class="data"> 
  <a href="example.com" title="uniqueTitle"></a>
</h3>

最好的方法是什么？另外我应该注意到，页面上有许多 <h3> 元素具有相同的 class 但不同的 <a> 我想抓取的标签。
提前致谢！

Answer 1

要获取 h3 标签内的所有 url，您可以使用例如

from scrapy import Selector
sel = Selector(text='''<h3 class="data"> 
  <a href="example.com" title="uniqueTitle"></a>
</h3>''')
print(sel.css('h3.data > a::attr(href)').extract()) # you can use this

输出：

['example.com']

Scrapy，从 H3 标签中获取 href？

Scrapy, get a href from inside a H3 tag?

python

scrapy

scrapy-shell