为什么 scrapy 反复只抓取一个结果?
Why does scrapy repeatedly scrape one result only?
请帮助我解决这个问题:以下蜘蛛代码预计 return start_url 的所有列出的作业。但是,它只有 return 份第一份工作。 Xpath 代码在 "Xpath Checker" 中被正确测试。怎么了?感谢您的输入!
from scrapy.spiders import Spider
from scrapy.selector import Selector
from Testjobs.items import TestjobsItem, TestjobsItemLoader
class TestjobSpider(Spider):
name = "test"
allowed_domains = ['http://careers.pathologyjobstoday.org/']
start_urls = [
'http://careers.pathologyjobstoday.org/jobseeker/search/results'
]
def parse(self, response):
hxs = Selector(response)
sites = hxs.xpath('//tr[contains(@id, "jt_jobrow_")]')
for site in sites:
il = TestjobsItemLoader(response=response, selector=site)
il.add_xpath('title', 'normalize-space(//div[@class="jt_jobs_title"]/text())')
yield il.load_item()
您需要在 "inner" XPath 上下文特定 前添加一个点:
normalize-space(.//div[@class="jt_jobs_title"]/text())
HERE^
请帮助我解决这个问题:以下蜘蛛代码预计 return start_url 的所有列出的作业。但是,它只有 return 份第一份工作。 Xpath 代码在 "Xpath Checker" 中被正确测试。怎么了?感谢您的输入!
from scrapy.spiders import Spider
from scrapy.selector import Selector
from Testjobs.items import TestjobsItem, TestjobsItemLoader
class TestjobSpider(Spider):
name = "test"
allowed_domains = ['http://careers.pathologyjobstoday.org/']
start_urls = [
'http://careers.pathologyjobstoday.org/jobseeker/search/results'
]
def parse(self, response):
hxs = Selector(response)
sites = hxs.xpath('//tr[contains(@id, "jt_jobrow_")]')
for site in sites:
il = TestjobsItemLoader(response=response, selector=site)
il.add_xpath('title', 'normalize-space(//div[@class="jt_jobs_title"]/text())')
yield il.load_item()
您需要在 "inner" XPath 上下文特定 前添加一个点:
normalize-space(.//div[@class="jt_jobs_title"]/text())
HERE^