如何在更改语言时不更改 URL 的网站上使用 Scrapy
How to use Scrapy on a website that does not change the URL when changing language
据我所知,当按下语言按钮时,这个网站 https://www.learnit.nl/ fetches the english version by sending a POST Request to https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1 我不知道如何用 Scrapy 复制。我将不胜感激任何帮助。
Data is in API calls json response with post method where payload is a big json 以及如何用Scrapy复制,你可以按照下一个例子:
import json
import scrapy
class CourseSpider(scrapy.Spider):
name = 'course'
body = add payload here
def start_requests(self):
yield scrapy.Request(
url='https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1',
callback=self.parse,
body=json.dumps(self.body),
method="POST",
headers={
}
)
def parse(self, response):
response = json.loads(response.body)
for resp in response['to_words']:
yield {
'course': resp
}
输出:
{'course': 'Writing clear texts'}
2022-04-28 22:03:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1>
{'course': 'HTML e-mail'}
2022-04-28 22:03:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1>
{'course': 'HTML and CSS Basics'}
2022-04-28 22:03:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1>
{'course': 'HTML and CSS Continued'}
2022-04-28 22:03:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1>
{'course': 'HTML Training E-learning'}
'downloader/response_status_count/200': 1,
'elapsed_time_seconds': 1.879555,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2022, 4, 28, 16, 3, 22, 536326),
'httpcompression/response_bytes': 36269,
'httpcompression/response_count': 1,
'item_scraped_count': 514,
...等等
因为有效负载很大 json 并且不能 post 超出限制。
完整的工作代码 here
据我所知,当按下语言按钮时,这个网站 https://www.learnit.nl/ fetches the english version by sending a POST Request to https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1 我不知道如何用 Scrapy 复制。我将不胜感激任何帮助。
Data is in API calls json response with post method where payload is a big json 以及如何用Scrapy复制,你可以按照下一个例子:
import json
import scrapy
class CourseSpider(scrapy.Spider):
name = 'course'
body = add payload here
def start_requests(self):
yield scrapy.Request(
url='https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1',
callback=self.parse,
body=json.dumps(self.body),
method="POST",
headers={
}
)
def parse(self, response):
response = json.loads(response.body)
for resp in response['to_words']:
yield {
'course': resp
}
输出:
{'course': 'Writing clear texts'}
2022-04-28 22:03:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1>
{'course': 'HTML e-mail'}
2022-04-28 22:03:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1>
{'course': 'HTML and CSS Basics'}
2022-04-28 22:03:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1>
{'course': 'HTML and CSS Continued'}
2022-04-28 22:03:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1>
{'course': 'HTML Training E-learning'}
'downloader/response_status_count/200': 1,
'elapsed_time_seconds': 1.879555,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2022, 4, 28, 16, 3, 22, 536326),
'httpcompression/response_bytes': 36269,
'httpcompression/response_count': 1,
'item_scraped_count': 514,
...等等
因为有效负载很大 json 并且不能 post 超出限制。 完整的工作代码 here