Response 不能 return Scrapy 中的整数值吗?
Can Response not return an integer value in Scrapy?
为了找出有多少法案有每个议员的签名,我正在尝试写一个关于议员的 3 层爬虫:
- 正在为列表中的每个 MP 访问 link
- 从 (1) 访问包含国会议员签名的法案等信息的页面
- 从(3)进入显示有国会议员签名的提案的页面,统计它们,将它们的编号分配给ktsayisi变量(这里出现问题)
在最后一层,我试图通过 len() 函数通过相关的 xss 选择器计数来 return 账单数量。但显然我无法将 (3) 中的 returned 数字分配给最终产生的值。
Scrapy return 只是访问的 link 而不是我想要函数 return 的数字。为什么会这样?我不能写一个 X = Request(url,callback = function) 这样的语句,其中 Response 中使用的定义函数可以迭代一个整数吗?我该如何解决?
我想要一个数字来代替产生的这些语句:https://www.tbmm.gov.tr/Milletvekilleri/KanunTeklifiUyeninImzasiBulunanTeklifler?donemKod=27&sicil=UqVZp9Fvweo=>
提前致谢。
'''
from scrapy import Spider
from scrapy.http import Request
class MvSpider(Spider):
name = 'mv'
allowed_domains = ['tbmm.gov.tr'] #website of the parliament
start_urls = ['https://www.tbmm.gov.tr/Milletvekilleri/liste'] #the link which has the list of MPs
def parse(self, response):
mv_linkler = response.xpath('//div[@class="col-md-8"]/a/@href').getall()
for link in mv_linkler:
mutlak_link = response.urljoin(link) #absolute url
yield Request(mutlak_link, callback = self.mv_analiz)
def mv_analiz(self, response): #function to analyze the MP
kteklif_link_path = response.xpath("//a[contains(text(),'İmzası Bulunan Kanun Teklifleri')]/@href").get()
kteklif_link = response.urljoin(kteklif_link_path)
ktsayisi = int(Request(kteklif_link, callback = self.kt_say)) #the value of the number of bill proposals to be requested
def kt_say(self,response):
kteklifler = response.xpath("//tr[@valign='TOP']")
return len(kteklifler)
'''
你不能,furas 的解释几乎涵盖了原因,我没有任何要补充的,你需要做这样的事情:
from scrapy import Spider
from scrapy.http import Request
class MvSpider(Spider):
name = 'mv'
allowed_domains = ['tbmm.gov.tr'] #website of the parliament
start_urls = ['https://www.tbmm.gov.tr/Milletvekilleri/liste'] #the link which has the list of MPs
def parse(self, response):
mv_linkler = response.xpath('//div[@class="col-md-8"]/a/@href').getall()
for link in mv_linkler:
mutlak_link = response.urljoin(link) #absolute url
yield Request(mutlak_link, callback=self.mv_analiz)
def mv_analiz(self, response): #function to analyze the MP
kteklif_link_path = response.xpath("//a[contains(text(),'İmzası Bulunan Kanun Teklifleri')]/@href").get()
kteklif_link = response.urljoin(kteklif_link_path)
item = {}
req = Request(kteklif_link, callback=self.kt_say) #the value of the number of bill proposals to be requested
req.meta['item'] = item
yield req
def kt_say(self, response):
kteklifler = response.xpath("//tr[@valign='TOP']")
item = response.meta['item']
item['ktsayisi'] = len(kteklifler)
yield item
为了找出有多少法案有每个议员的签名,我正在尝试写一个关于议员的 3 层爬虫:
- 正在为列表中的每个 MP 访问 link
- 从 (1) 访问包含国会议员签名的法案等信息的页面
- 从(3)进入显示有国会议员签名的提案的页面,统计它们,将它们的编号分配给ktsayisi变量(这里出现问题)
在最后一层,我试图通过 len() 函数通过相关的 xss 选择器计数来 return 账单数量。但显然我无法将 (3) 中的 returned 数字分配给最终产生的值。
Scrapy return 只是访问的 link 而不是我想要函数 return 的数字。为什么会这样?我不能写一个 X = Request(url,callback = function) 这样的语句,其中 Response 中使用的定义函数可以迭代一个整数吗?我该如何解决?
我想要一个数字来代替产生的这些语句:
提前致谢。
'''
from scrapy import Spider
from scrapy.http import Request
class MvSpider(Spider):
name = 'mv'
allowed_domains = ['tbmm.gov.tr'] #website of the parliament
start_urls = ['https://www.tbmm.gov.tr/Milletvekilleri/liste'] #the link which has the list of MPs
def parse(self, response):
mv_linkler = response.xpath('//div[@class="col-md-8"]/a/@href').getall()
for link in mv_linkler:
mutlak_link = response.urljoin(link) #absolute url
yield Request(mutlak_link, callback = self.mv_analiz)
def mv_analiz(self, response): #function to analyze the MP
kteklif_link_path = response.xpath("//a[contains(text(),'İmzası Bulunan Kanun Teklifleri')]/@href").get()
kteklif_link = response.urljoin(kteklif_link_path)
ktsayisi = int(Request(kteklif_link, callback = self.kt_say)) #the value of the number of bill proposals to be requested
def kt_say(self,response):
kteklifler = response.xpath("//tr[@valign='TOP']")
return len(kteklifler)
'''
你不能,furas 的解释几乎涵盖了原因,我没有任何要补充的,你需要做这样的事情:
from scrapy import Spider
from scrapy.http import Request
class MvSpider(Spider):
name = 'mv'
allowed_domains = ['tbmm.gov.tr'] #website of the parliament
start_urls = ['https://www.tbmm.gov.tr/Milletvekilleri/liste'] #the link which has the list of MPs
def parse(self, response):
mv_linkler = response.xpath('//div[@class="col-md-8"]/a/@href').getall()
for link in mv_linkler:
mutlak_link = response.urljoin(link) #absolute url
yield Request(mutlak_link, callback=self.mv_analiz)
def mv_analiz(self, response): #function to analyze the MP
kteklif_link_path = response.xpath("//a[contains(text(),'İmzası Bulunan Kanun Teklifleri')]/@href").get()
kteklif_link = response.urljoin(kteklif_link_path)
item = {}
req = Request(kteklif_link, callback=self.kt_say) #the value of the number of bill proposals to be requested
req.meta['item'] = item
yield req
def kt_say(self, response):
kteklifler = response.xpath("//tr[@valign='TOP']")
item = response.meta['item']
item['ktsayisi'] = len(kteklifler)
yield item