为什么scrapy在我的浏览器xpath中找不到xpath?
why scrapy can not find xpath that is found in my browser xpath?
我是 scrapy 的新手,我在使用下面的代码提取价格而不是名称时遇到困难。
知道我为了得到价格做错了什么吗?谢谢!
这是代码:
import scrapy
class BfPreciosSpider(scrapy.Spider):
name = 'BF_precios'
allowed_domains = ['https://www.boerse-frankfurt.de']
start_urls = ['https://www.boerse-frankfurt.de/anleihe/xs1186131717-fce-bank-plc-1-134-15-22']
def parse(self, response):
what_name=response.xpath('/html/body/app-root/app-wrapper/div/div[2]/app-bond/div[1]/div/app-widget-datasheet-header/div/div/div/div/div[1]/div/h1/text()').extract_first()
what_price=response.xpath('/html/body/app-root/app-wrapper/div/div[2]/app-bond/div[2]/div[3]/div[1]/font/text()').extract_first()
yield{'name': what_name , 'price': what_price}
这些是商品(红色)- 名称和价格:
name
信息可直接在页面上获得,但 price
信息是从 api 获得的。如果您调查网络流量,您会发现 api 调用 returns 价格信息。请参阅下面的示例,了解如何获取此数据。
import scrapy
from time import time
class RealtorSpider(scrapy.Spider):
name = 'BF_precios'
allowed_domains = ['boerse-frankfurt.de']
custom_settings = {
'USER_AGENT': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.80 Safari/537.36'
}
start_urls = ['https://www.boerse-frankfurt.de/anleihe/xs1186131717-fce-bank-plc-1-134-15-22']
def parse(self, response):
item = {}
current_time = int(time())
name = response.xpath('//h1/text()').get()
isin = response.xpath("//span[contains(text(),'ISIN:')]/text()").re_first(r"ISIN:\s(.*)$")
mic = response.xpath("//app-widget-index-price-information/@mic").get()
api_url = f"https://api.boerse-frankfurt.de/v1/tradingview/lightweight/history/single?\
resolution=D&isKeepResolutionForLatestWeeksIfPossible=false\
&from={current_time}&to={current_time}&isBidAskPrice=false&symbols={mic}%3A{isin}"
item['name'] = name
item['isin'] = isin
item['mic'] = mic
yield response.follow(api_url, callback=self.parse_price, cb_kwargs={"item": item})
def parse_price(self, response, item):
item['price'] = response.json()[0]['quotes']['timeValuePairs'][0]['value']
yield item
运行 上面的蜘蛛会产生一个类似于下面的字典
{'name': 'FCE Bank PLC 1,134% 15/22', 'isin': 'XS1186131717', 'mic': 'XFRA', 'price': 99.955}
我是 scrapy 的新手,我在使用下面的代码提取价格而不是名称时遇到困难。 知道我为了得到价格做错了什么吗?谢谢!
这是代码:
import scrapy
class BfPreciosSpider(scrapy.Spider):
name = 'BF_precios'
allowed_domains = ['https://www.boerse-frankfurt.de']
start_urls = ['https://www.boerse-frankfurt.de/anleihe/xs1186131717-fce-bank-plc-1-134-15-22']
def parse(self, response):
what_name=response.xpath('/html/body/app-root/app-wrapper/div/div[2]/app-bond/div[1]/div/app-widget-datasheet-header/div/div/div/div/div[1]/div/h1/text()').extract_first()
what_price=response.xpath('/html/body/app-root/app-wrapper/div/div[2]/app-bond/div[2]/div[3]/div[1]/font/text()').extract_first()
yield{'name': what_name , 'price': what_price}
这些是商品(红色)- 名称和价格:
name
信息可直接在页面上获得,但 price
信息是从 api 获得的。如果您调查网络流量,您会发现 api 调用 returns 价格信息。请参阅下面的示例,了解如何获取此数据。
import scrapy
from time import time
class RealtorSpider(scrapy.Spider):
name = 'BF_precios'
allowed_domains = ['boerse-frankfurt.de']
custom_settings = {
'USER_AGENT': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.80 Safari/537.36'
}
start_urls = ['https://www.boerse-frankfurt.de/anleihe/xs1186131717-fce-bank-plc-1-134-15-22']
def parse(self, response):
item = {}
current_time = int(time())
name = response.xpath('//h1/text()').get()
isin = response.xpath("//span[contains(text(),'ISIN:')]/text()").re_first(r"ISIN:\s(.*)$")
mic = response.xpath("//app-widget-index-price-information/@mic").get()
api_url = f"https://api.boerse-frankfurt.de/v1/tradingview/lightweight/history/single?\
resolution=D&isKeepResolutionForLatestWeeksIfPossible=false\
&from={current_time}&to={current_time}&isBidAskPrice=false&symbols={mic}%3A{isin}"
item['name'] = name
item['isin'] = isin
item['mic'] = mic
yield response.follow(api_url, callback=self.parse_price, cb_kwargs={"item": item})
def parse_price(self, response, item):
item['price'] = response.json()[0]['quotes']['timeValuePairs'][0]['value']
yield item
运行 上面的蜘蛛会产生一个类似于下面的字典
{'name': 'FCE Bank PLC 1,134% 15/22', 'isin': 'XS1186131717', 'mic': 'XFRA', 'price': 99.955}