如何在更改语言时不更改 URL 的网站上使用 Scrapy

Question

据我所知，当按下语言按钮时，这个网站 https://www.learnit.nl/ fetches the english version by sending a POST Request to https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1 我不知道如何用 Scrapy 复制。我将不胜感激任何帮助。

Answer 1

Data is in API calls json response with post method where payload is a big json 以及如何用Scrapy复制，你可以按照下一个例子:

import json
import scrapy

class CourseSpider(scrapy.Spider):

    name = 'course'
    body = add payload here

    def start_requests(self):
        yield scrapy.Request(
            url='https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1',
            callback=self.parse,
            body=json.dumps(self.body),
            method="POST",
            headers={

            }
        )

    def parse(self, response):
        response = json.loads(response.body)
       

        for resp in response['to_words']:
            yield {
                'course': resp
                }

输出：

{'course': 'Writing clear texts'}
2022-04-28 22:03:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1>
{'course': 'HTML e-mail'}
2022-04-28 22:03:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1>
{'course': 'HTML and CSS Basics'}
2022-04-28 22:03:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1>
{'course': 'HTML and CSS Continued'}
2022-04-28 22:03:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1>
{'course': 'HTML Training E-learning'}

 'downloader/response_status_count/200': 1,
 'elapsed_time_seconds': 1.879555,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2022, 4, 28, 16, 3, 22, 536326),
 'httpcompression/response_bytes': 36269,
 'httpcompression/response_count': 1,
 'item_scraped_count': 514,

...等等

因为有效负载很大 json 并且不能 post 超出限制。完整的工作代码 here

如何在更改语言时不更改 URL 的网站上使用 Scrapy

How to use Scrapy on a website that does not change the URL when changing language

scrapy

web-scraping

python-3.x