如何使用 scrapy 从嵌入在 h2 下的标签中提取文本?
How to extract text from a tag that is embedded under h2 using scrapy?
我想从标签中提取名称。
response.css('h2.product-names::text').get()
但它正在返回:
<h2 class="product-names">
\<a target="\_blank" href="https://www.electronicsbazaar.com/dell-inspiron-13-7348-core-i5-5200u-2-20ghz-8gb-500gb-int-webcam-win-10-13-3-touch" title='Refurbished Dell Inspiron 13 7348 (Core I5 5Th Gen/8GB/500GB/Int/Win 10/13.3" Touch)'\>\n Refurbished Dell Inspiron 13 7348 (Core I5 5Th Gen/8GB/500GB/Int/Win 10/13.3" Touch) </a>
</h2>
如何获取 link 的文本?
我试过了:
response.css('h2.product-names').get()
<h2 class="product-names">
\<a target="\_blank" href="https://www.electronicsbazaar.com/dell-inspiron-13-7348-core-i5-5200u-2-20ghz-8gb-500gb-int-webcam-win-10-13-3-touch" title='Refurbished Dell Inspiron 13 7348 (Core I5 5Th Gen/8GB/500GB/Int/Win 10/13.3" Touch)'\>\n Refurbished Dell Inspiron 13 7348 (Core I5 5Th Gen/8GB/500GB/Int/Win 10/13.3" Touch) </a>
</h2>
问题是,如果我从你的屏幕截图中没看错的话,名称包含在标签中
正确的 xpath 是:
response.xpath('//h2[@class="product-names"]/a/@title').extract()
我想从标签中提取名称。
response.css('h2.product-names::text').get()
但它正在返回:
<h2 class="product-names">
\<a target="\_blank" href="https://www.electronicsbazaar.com/dell-inspiron-13-7348-core-i5-5200u-2-20ghz-8gb-500gb-int-webcam-win-10-13-3-touch" title='Refurbished Dell Inspiron 13 7348 (Core I5 5Th Gen/8GB/500GB/Int/Win 10/13.3" Touch)'\>\n Refurbished Dell Inspiron 13 7348 (Core I5 5Th Gen/8GB/500GB/Int/Win 10/13.3" Touch) </a>
</h2>
如何获取 link 的文本?
我试过了:
response.css('h2.product-names').get()
<h2 class="product-names">
\<a target="\_blank" href="https://www.electronicsbazaar.com/dell-inspiron-13-7348-core-i5-5200u-2-20ghz-8gb-500gb-int-webcam-win-10-13-3-touch" title='Refurbished Dell Inspiron 13 7348 (Core I5 5Th Gen/8GB/500GB/Int/Win 10/13.3" Touch)'\>\n Refurbished Dell Inspiron 13 7348 (Core I5 5Th Gen/8GB/500GB/Int/Win 10/13.3" Touch) </a>
</h2>
问题是,如果我从你的屏幕截图中没看错的话,名称包含在标签中 正确的 xpath 是:
response.xpath('//h2[@class="product-names"]/a/@title').extract()