如何抓取没有参考或名称属性的项目?
How to scrape item with no reference or name attribute?
嗨,我真的是 scrapy scrape 的新手,我尝试了基本代码,但这是一种独特的代码,我在这里尝试了不同的方法。怎么才能得到这里的点赞数、爱数和信息量
https://teslamotorsclub.com/tmc/threads/tesla-tsla-the-investment-world-the-2019-investors-roundtable.139047/
这是我的代码
<ul class="dark_postrating_outputlist">
<li>
<i class="fa fa-info-circle"></i> Informative x <strong>1</strong>
</li>
<li>
<i class="fa fa-thumbs-o-up"></i> Like x <strong>1</strong>
</li>
</ul>
我想得到里面的特定物品
我试过这个
response.css('ul.dark_postrating_outputlist i.fa.fa-thumbs-o-up strong::text').extract_first()
但是它不起作用,有什么想法吗?谢谢
您可以添加一些更具体的选择器来分隔 "likes" 和 "informative" 数据。检查这个例子:
>>> txt = """<ul class="dark_postrating_outputlist">
... <li>
... <i class="fa fa-info-circle"></i> Informative x <strong>1</strong>
... </li>
... <li>
... <i class="fa fa-thumbs-o-up"></i> Like x <strong>2</strong>
... </li>
... </ul>"""
>>> from scrapy import Selector
>>> sel = Selector(text=txt)
>>> sel.css('ul.dark_postrating_outputlist li:contains("Informative") strong::text').get()
u'1'
>>> sel.css('ul.dark_postrating_outputlist li:contains("Like") strong::text').get()
u'2'
在这里您可以单独获取您的号码。
使用 XPath 而不是 CSS:
response.xpath('//ul[@class="dark_postrating_outputlist"]/li[//i[contains()"fa-thumbs-o-up"]]/strong/text()').get()
尝试以下方法获取所需内容:
import scrapy
class TeslamotorsclubSpider(scrapy.Spider):
name = "teslamotorsclub"
start_urls = ["https://teslamotorsclub.com/tmc/threads/tesla-tsla-the-investment-world-the-2019-investors-roundtable.139047/"]
def parse(self, response):
for item in response.css("[id^='fc-post-']"):
author = item.css(".author::text").get()
like = item.css(".fa-thumbs-o-up + strong::text").get()
love = item.css(".fa-heart-o + strong::text").get()
informative = item.css(".fa-info-circle + strong::text").get()
yield {"author":author,"like":like,"love":love,"informative":informative}
部分输出:
{'author': 'Unpilot', 'like': '1', 'love': '4', 'informative': '1'}
{'author': 'UnknownSoldier', 'like': '7', 'love': '2', 'informative': '1'}
{'author': 'SpaceCash', 'like': '2', 'love': '15', 'informative': '2'}
{'author': 'gene', 'like': '45', 'love': '18', 'informative': '1'}
{'author': 'engle', 'like': '31', 'love': '5', 'informative': '15'}
{'author': 'Unpilot', 'like': '11', 'love': '3', 'informative': None}
{'author': 'SebastianR', 'like': '3', 'love': None, 'informative': None}
{'author': 'Buckminster', 'like': '1', 'love': '4', 'informative': None}
嗨,我真的是 scrapy scrape 的新手,我尝试了基本代码,但这是一种独特的代码,我在这里尝试了不同的方法。怎么才能得到这里的点赞数、爱数和信息量 https://teslamotorsclub.com/tmc/threads/tesla-tsla-the-investment-world-the-2019-investors-roundtable.139047/
这是我的代码
<ul class="dark_postrating_outputlist">
<li>
<i class="fa fa-info-circle"></i> Informative x <strong>1</strong>
</li>
<li>
<i class="fa fa-thumbs-o-up"></i> Like x <strong>1</strong>
</li>
</ul>
我想得到里面的特定物品 我试过这个
response.css('ul.dark_postrating_outputlist i.fa.fa-thumbs-o-up strong::text').extract_first()
但是它不起作用,有什么想法吗?谢谢
您可以添加一些更具体的选择器来分隔 "likes" 和 "informative" 数据。检查这个例子:
>>> txt = """<ul class="dark_postrating_outputlist">
... <li>
... <i class="fa fa-info-circle"></i> Informative x <strong>1</strong>
... </li>
... <li>
... <i class="fa fa-thumbs-o-up"></i> Like x <strong>2</strong>
... </li>
... </ul>"""
>>> from scrapy import Selector
>>> sel = Selector(text=txt)
>>> sel.css('ul.dark_postrating_outputlist li:contains("Informative") strong::text').get()
u'1'
>>> sel.css('ul.dark_postrating_outputlist li:contains("Like") strong::text').get()
u'2'
在这里您可以单独获取您的号码。
使用 XPath 而不是 CSS:
response.xpath('//ul[@class="dark_postrating_outputlist"]/li[//i[contains()"fa-thumbs-o-up"]]/strong/text()').get()
尝试以下方法获取所需内容:
import scrapy
class TeslamotorsclubSpider(scrapy.Spider):
name = "teslamotorsclub"
start_urls = ["https://teslamotorsclub.com/tmc/threads/tesla-tsla-the-investment-world-the-2019-investors-roundtable.139047/"]
def parse(self, response):
for item in response.css("[id^='fc-post-']"):
author = item.css(".author::text").get()
like = item.css(".fa-thumbs-o-up + strong::text").get()
love = item.css(".fa-heart-o + strong::text").get()
informative = item.css(".fa-info-circle + strong::text").get()
yield {"author":author,"like":like,"love":love,"informative":informative}
部分输出:
{'author': 'Unpilot', 'like': '1', 'love': '4', 'informative': '1'}
{'author': 'UnknownSoldier', 'like': '7', 'love': '2', 'informative': '1'}
{'author': 'SpaceCash', 'like': '2', 'love': '15', 'informative': '2'}
{'author': 'gene', 'like': '45', 'love': '18', 'informative': '1'}
{'author': 'engle', 'like': '31', 'love': '5', 'informative': '15'}
{'author': 'Unpilot', 'like': '11', 'love': '3', 'informative': None}
{'author': 'SebastianR', 'like': '3', 'love': None, 'informative': None}
{'author': 'Buckminster', 'like': '1', 'love': '4', 'informative': None}