Scrapy：继续解析函数的处理结果

Question

我正在尝试解析A页面，将页面中列出的文件下载到本地磁盘，将A页面中的URL替换为我保存的文件中的URL，最后将A页面保存到本地磁盘。

我试过文件管道，但它不起作用。 A 页面中的 URL 看起来像 http:...php?id=1234 所以内置 file_path() returns 一个错误。覆盖 file_path() 只会停止管道工作而没有任何调试输出。

所以我找到了这个 post：

Answer I referred

我申请后发现解析函数不会改变我传入meta的数据。我的代码是这样的：

def ParseClientCaseNote(self,response):
        # The function is to download all attachments and replace URL inside pointing to local files
        TestMeta='this is to test meta argu'
        for a in AttachmentList:
            yield scrapy.Request(a,callback=self.DownClientCaseNoteAttach,meta={'test':TestMeta})

        self.logger.info('ParseClientCaseNote: after call DownClientCaseNoteAttach, testmeta is: ' + TestMeta)

        return

def DownClientCaseNoteAttach(self,response):
        TestArg=response.meta['test']
        self.logger.info('DownClientCaseNoteAttach: test meta')
        self.logger.info(TestArg)
        TestArg='this is revised from DownClientCaseNoteAttach'

        with open(AbsPath,'wb') as f:
            f.write(response.body)
        return

我在日志中得到以下结果：

2018-09-29 09:26:13 [debug] INFO: ParseClientCaseNote: after call DownClientCaseNoteAttach, testmeta is: this is to test meta argu 2018-09-29 09:26:17 [debug] INFO: DownClientCaseNoteAttach: test meta 2018-09-29 09:26:17 [debug] INFO: this is to test meta argu

解析函数似乎被延迟了。我怎样才能得到正确的结果？

谢谢

Answer 1

我使用了解决方法来解决这个问题。在页面 A 中，我在 web 上获取文件名并将名称传递给自己的下载功能，更改 url 指向具有 web 上名称的本地文件。在下载功能中，我从 response.headers['Content-Disposition'].decode(response.headers.encoding) 验证文件名，以确保它与我在保存之前在页面 A 上找到的相同。

Scrapy：继续解析函数的处理结果

Scrapy: continue process result from parse function

meta

response

request

scrapy

deferred