使用 scrapy 在 Stack Overflow 中进行 Web 抓取,但我无法获得问题的投票
Web scraping in Stack Overflow with scrapy but I can't get the votes of the question
我正在抓取 Stack Overflow,我已经抓取了标题、URL 和标签,但我无法抓取每个问题的投票。有人能帮我吗?我不太擅长 xpath
def parse_item(self, response):
questions = response.xpath('//div[@class="question-summary"]')
for question in questions:
item = StackItem()
item['url'] = question.xpath(
'div[@class="summary"]/h3/a[@class="question-hyperlink"]/@href').extract()[0]
item['title'] = question.xpath(
'div[@class="summary"]/h3/a[@class="question-hyperlink"]/text()').extract()[0]
item['tags'] = question.xpath(
'div[@class="summary"]/div[2]/a[@class="post-tag"]/text()').extract()
item['votes'] = question.xpath(
'/div[1]/div[1]/div[1]/div[1]/span/strong/textContent()').extract()[0]
yield item
怎么样
item['votes'] = question.css('.vote-count-post > strong::text').extract()[0]
?
如果你想使用xpath
item['votes'] = question.xpath(".//div[@class='votes']//strong/text()").extract_first()
注意 .//div xpath 前面的点
Check scrapy doc
我正在抓取 Stack Overflow,我已经抓取了标题、URL 和标签,但我无法抓取每个问题的投票。有人能帮我吗?我不太擅长 xpath
def parse_item(self, response):
questions = response.xpath('//div[@class="question-summary"]')
for question in questions:
item = StackItem()
item['url'] = question.xpath(
'div[@class="summary"]/h3/a[@class="question-hyperlink"]/@href').extract()[0]
item['title'] = question.xpath(
'div[@class="summary"]/h3/a[@class="question-hyperlink"]/text()').extract()[0]
item['tags'] = question.xpath(
'div[@class="summary"]/div[2]/a[@class="post-tag"]/text()').extract()
item['votes'] = question.xpath(
'/div[1]/div[1]/div[1]/div[1]/span/strong/textContent()').extract()[0]
yield item
怎么样
item['votes'] = question.css('.vote-count-post > strong::text').extract()[0]
?
如果你想使用xpath
item['votes'] = question.xpath(".//div[@class='votes']//strong/text()").extract_first()
注意 .//div xpath 前面的点 Check scrapy doc