如何使用异步请求保存 JSON 响应?
How to save JSON responses with asynchronous requests?
我有一个关于异步请求的问题:
如何即时将 response.json()
保存到文件?
我想发出请求并将响应保存到 .json
文件,而不将其保存在内存中。
import asyncio
import aiohttp
async def fetch(sem, session, url):
async with sem:
async with session.get(url) as response:
return await response.json() # here
async def fetch_all(urls, loop):
sem = asyncio.Semaphore(4)
async with aiohttp.ClientSession(loop=loop) as session:
results = await asyncio.gather(
*[fetch(sem, session, url) for url in urls]
)
return results
if __name__ == '__main__':
urls = (
"https://public.api.openprocurement.org/api/2.5/tenders/6a0585fcfb05471796bb2b6a1d379f9b",
"https://public.api.openprocurement.org/api/2.5/tenders/d1c74ec8bb9143d5b49e7ef32202f51c",
"https://public.api.openprocurement.org/api/2.5/tenders/a3ec49c5b3e847fca2a1c215a2b69f8d",
"https://public.api.openprocurement.org/api/2.5/tenders/52d8a15c55dd4f2ca9232f40c89bfa82",
"https://public.api.openprocurement.org/api/2.5/tenders/b3af1cc6554440acbfe1d29103fe0c6a",
"https://public.api.openprocurement.org/api/2.5/tenders/1d1c6560baac4a968f2c82c004a35c90",
)
loop = asyncio.get_event_loop()
data = loop.run_until_complete(fetch_all(urls, loop))
print(data)
现在,脚本只打印 JSON 个文件,一旦它们全部被抓取,我就可以保存它们:
data = loop.run_until_complete(fetch_all(urls, loop))
for i, resp in enumerate(data):
with open(f"{i}.json", "w") as f:
json.dump(resp, f)
但我觉得不对,因为一旦我 运行 内存不足,它肯定会失败。
有什么建议吗?
编辑
将我的post限制为只有一个问题
How do I save response.json()
to a file, on the fly?
首先不要使用 response.json()
,而是使用 streaming API:
async def fetch(sem, session, url):
async with sem, session.get(url) as response:
with open("some_file_name.json", "wb") as out:
async for chunk in response.content.iter_chunked(4096)
out.write(chunk)
我有一个关于异步请求的问题:
如何即时将 response.json()
保存到文件?
我想发出请求并将响应保存到 .json
文件,而不将其保存在内存中。
import asyncio
import aiohttp
async def fetch(sem, session, url):
async with sem:
async with session.get(url) as response:
return await response.json() # here
async def fetch_all(urls, loop):
sem = asyncio.Semaphore(4)
async with aiohttp.ClientSession(loop=loop) as session:
results = await asyncio.gather(
*[fetch(sem, session, url) for url in urls]
)
return results
if __name__ == '__main__':
urls = (
"https://public.api.openprocurement.org/api/2.5/tenders/6a0585fcfb05471796bb2b6a1d379f9b",
"https://public.api.openprocurement.org/api/2.5/tenders/d1c74ec8bb9143d5b49e7ef32202f51c",
"https://public.api.openprocurement.org/api/2.5/tenders/a3ec49c5b3e847fca2a1c215a2b69f8d",
"https://public.api.openprocurement.org/api/2.5/tenders/52d8a15c55dd4f2ca9232f40c89bfa82",
"https://public.api.openprocurement.org/api/2.5/tenders/b3af1cc6554440acbfe1d29103fe0c6a",
"https://public.api.openprocurement.org/api/2.5/tenders/1d1c6560baac4a968f2c82c004a35c90",
)
loop = asyncio.get_event_loop()
data = loop.run_until_complete(fetch_all(urls, loop))
print(data)
现在,脚本只打印 JSON 个文件,一旦它们全部被抓取,我就可以保存它们:
data = loop.run_until_complete(fetch_all(urls, loop))
for i, resp in enumerate(data):
with open(f"{i}.json", "w") as f:
json.dump(resp, f)
但我觉得不对,因为一旦我 运行 内存不足,它肯定会失败。
有什么建议吗?
编辑
将我的post限制为只有一个问题
How do I save
response.json()
to a file, on the fly?
首先不要使用 response.json()
,而是使用 streaming API:
async def fetch(sem, session, url):
async with sem, session.get(url) as response:
with open("some_file_name.json", "wb") as out:
async for chunk in response.content.iter_chunked(4096)
out.write(chunk)