尝试 t scrape table 提供空输出

Question

我要抓取 table 但他们会为我提供空输出论文页面 link https://www.sidmartinbio.org/why-is-the-jugular-vein-so-important/

from scrapy.http import Request
import scrapy
class PushpaSpider(scrapy.Spider):
    name = 'pushpa'
    page_number = 1
    start_urls = ['https://www.sidmartinbio.org/why-is-the-jugular-vein-so-important/']
    custom_settings = {
        'CONCURRENT_REQUESTS_PER_DOMAIN': 1,
        'DOWNLOAD_DELAY': 1,
        'USER_AGENT': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36'
    }



    def parse(self, response):
        details={}
        key=response.xpath("//table//tbody/tr/td[1]/text()").get()
        value=response.xpath("//table//tbody/tr/td[2]/text()").get()
        details[key]=value
        
        yield details

Answer 1

xpath 选择有点困难correctly.Now它正在工作。

from scrapy.http import Request
import scrapy

class PushpaSpider(scrapy.Spider):
    name = 'pushpa'
    page_number = 1
    start_urls = [
        'https://www.sidmartinbio.org/why-is-the-jugular-vein-so-important']
   

    def parse(self, response):
        details={}
        key=response.xpath("//td[contains(.,'Source')]/text()").get()
        value=response.xpath("//td[contains(.,'Source')]/following-sibling::td/text()").get()
        details[key]=value
        
        yield details

输出：

{'Source': 'Sigmoid sinus and Inferior petrosal sinus'}

尝试 t scrape table 提供空输出

Trying t scrape table provide empty output

python

scrapy

web-scraping