我的 xpath 很好,但我对 scrapy 一无所知
my xpath is good, but i've got nothing with scrapy
我尝试用 scrapy 抓取一页。我用 FireXpath(一个 firefox 插件)找到了 xpath,它看起来不错。但是用Scrapy,我没有结果。
我的 python 程序如下所示:
#!/usr/bin/python -tt
# -*- coding: utf-8 -*-
from scrapy.selector import Selector
from scrapy.contrib.spiders import CrawlSpider
from datetime import datetime
from scrapy.spider import BaseSpider
class robtex(BaseSpider):
# Crawling Start
CrawlSpider.started_on = datetime.now()
# CrawlSpider
name = 'robtex'
DOWNLOAD_DELAY = 3
start_urls = [ "https://www.whois.com/en/advisory/dns/com/Whosebug/whois.html"]
def parse(self, response):
# Selector
sel = Selector(response)
print sel.xpath(".//*[@id='datawhois']/div[2]/table[3]/tbody/tr[3]/td[2]/a/text()").extract()
如何解决?
提前致谢。
您只需从 XPath 表达式中删除 tbody
:
.//*[@id='datawhois']/div[2]/table[3]/tr[3]/td[2]/a/text()
演示:
$ scrapy shell https://www.robtex.com/en/advisory/dns/com/Whosebug/whois.html
In [1]: response.xpath(".//*[@id='datawhois']/div[2]/table[3]/tbody/tr[3]/td[2]/a/text()").extract()
Out[1]: []
In [2]: response.xpath(".//*[@id='datawhois']/div[2]/table[3]/tr[3]/td[2]/a/text()").extract()
Out[2]: [u'whosebug.com']
我尝试用 scrapy 抓取一页。我用 FireXpath(一个 firefox 插件)找到了 xpath,它看起来不错。但是用Scrapy,我没有结果。
我的 python 程序如下所示:
#!/usr/bin/python -tt
# -*- coding: utf-8 -*-
from scrapy.selector import Selector
from scrapy.contrib.spiders import CrawlSpider
from datetime import datetime
from scrapy.spider import BaseSpider
class robtex(BaseSpider):
# Crawling Start
CrawlSpider.started_on = datetime.now()
# CrawlSpider
name = 'robtex'
DOWNLOAD_DELAY = 3
start_urls = [ "https://www.whois.com/en/advisory/dns/com/Whosebug/whois.html"]
def parse(self, response):
# Selector
sel = Selector(response)
print sel.xpath(".//*[@id='datawhois']/div[2]/table[3]/tbody/tr[3]/td[2]/a/text()").extract()
如何解决?
提前致谢。
您只需从 XPath 表达式中删除 tbody
:
.//*[@id='datawhois']/div[2]/table[3]/tr[3]/td[2]/a/text()
演示:
$ scrapy shell https://www.robtex.com/en/advisory/dns/com/Whosebug/whois.html
In [1]: response.xpath(".//*[@id='datawhois']/div[2]/table[3]/tbody/tr[3]/td[2]/a/text()").extract()
Out[1]: []
In [2]: response.xpath(".//*[@id='datawhois']/div[2]/table[3]/tr[3]/td[2]/a/text()").extract()
Out[2]: [u'whosebug.com']