为什么 scrapy-splash 没有发送正确的 url?

Why is scrapy-splash not sending correct url?

我正在使用 Splash 渲染 javascript。但它发送的 URL 不正确。准确地说,它发送前面的url。看看这段代码。

def parse:
            splash_args = {'html': 1, 'png': 0}
            url = 'http://quotes.toscrape.com/js'
            yield Request(url,
                          self.parse_result,
                          meta={'splash': {
                                     'endpoint':'render.html',
                                     'args': splash_args,
                                      'splash_url': 'http://localhost:8050'
                                     }
                             }
                          )

            url = 'https://www.google.com'
            yield Request(url,
                          self.parse_result,
                          meta={'splash': {
                                     'endpoint':'render.html',
                                     'args': splash_args,
                                      'splash_url': 'http://localhost:8050'
                                     }
                             }
                          )

def parse_result(self, response):
            print(response.url)

我已经使用 docker 容器来 运行 Splash。在 docker 日志中我看到了这个:

2020-08-02 05:34:09.061509 [events] {"active": 1, "status_code": 200, "args": {"headers": {"User-Agent": "Scrapy/2.2.0 (+https://scrapy.org)", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Language": "en"}, "html": 1, "png": 0, "url": "http://quotes.toscrape.com/js", "uid": 140386374564776}, "client_ip": "172.17.0.1", "qsize": 0, "user-agent": "Scrapy/2.2.0 (+https://scrapy.org)", "load": [0.1, 0.08, 0.06], "path": "/render.html", "fds": 22, "method": "POST", "maxrss": 746168, "rendertime": 0.109375, "_id": 140386374564776, "timestamp": 1596346449}
2020-08-02 05:34:09.062780 [-] "172.17.0.1" - - [02/Aug/2020:05:34:08 +0000] "POST /render.html HTTP/1.1" 200 8974 "-" "Scrapy/2.2.0 (+https://scrapy.org)"

2020-08-02 05:34:09.072852 [events] {"active": 0, "status_code": 200, "args": {"headers": {"User-Agent": "Scrapy/2.2.0 (+https://scrapy.org)", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Language": "en"}, "html": 1, "png": 0, "url": "http://quotes.toscrape.com/js", "uid": 140386500587760}, "client_ip": "172.17.0.1", "qsize": 0, "user-agent": "Scrapy/2.2.0 (+https://scrapy.org)", "load": [0.1, 0.08, 0.06], "path": "/render.html", "fds": 22, "method": "POST", "maxrss": 746168, "rendertime": 0.13172173500061035, "_id": 140386500587760, "timestamp": 1596346449}
2020-08-02 05:34:09.073582 [-] "172.17.0.1" - - [02/Aug/2020:05:34:08 +0000] "POST /render.html HTTP/1.1" 200 8974 "-" "Scrapy/2.2.0 (+https://scrapy.org)"

两个请求都具有相同的 url 到 'quotes.toscrape.com',但没有看到对 'www.google.com' 的请求。

在标准输出中,我也没有看到 google.com。

2020-08-02 15:34:09 [scrapy.core.engine] DEBUG: Crawled (200) <POST http://localhost:8050/render.html> (referer: None)
2020-08-02 15:34:09 [scrapy.core.engine] DEBUG: Crawled (200) <POST http://localhost:8050/render.html> (referer: None)
http://quotes.toscrape.com/js
http://quotes.toscrape.com/js
2020-08-02 15:34:09 [scrapy.core.engine] INFO: Closing spider (finished)

response.url 只打印了 quotes.toscrape.com。我确信这两个请求都已执行,因为我们看到有两个请求被发出。只是 URL 不正确。请帮忙。

似乎在 Request(url) 中使用 url 是不够的。我也必须在 meta['splash']['url'] 中添加 url 并且有效。