为每 n 个元素打开新文件
Open new file for every n element
我有 n 个批次,每个批次包含 100 个 API 个请求。第 1 批包含文件 [1-100],第 2 批包含文件 [101-200] 等...
我想将其中的每一个转储到一个 json 文件中。这很好.. 但是,我想将 100k json 响应转储到 1 个文件中,然后创建一个新文件并将接下来的 100k 观察值转储到另一个文件中。
我需要配置一个根据批号创建文件的功能,我尝试了以下方法:
def open_file(self, batch):
if batch % 1000 == 0:
filename = f"data_{batch}.json"
else:
filename = ""
f = open(filename, "a")
return f
如果 batch % 1000 == 0,那么我想更改名称(因为批号 1000 -> 1000 批 100 json 请求 = 总共 100k)。但是,这显然不起作用,因为当我评估批次 1001 时,旧文件再次打开。如何为批次 1-999 创建一个文件,为批次 1000-1999 创建另一个文件,然后为 2000-2999 创建另一个文件....?
谢谢
编辑:附加信息
def fetch_data(self, sequence, batch=1):
# fetch event list (event list size = 100)
event_list = self.events(sequence)
# open files to store data
f = self.open_file(batch)
# opening thread pool executor for multi-threading
with ThreadPoolExecutor(max_workers=4) as executor:
# self.thread_event_list multi-threads each event in event list by sending
# and API request.
for i, response in enumerate(executor.map(self.thread_event_list, event_list), 1):
json.dump(response.json(), f)
f.write("\n")
# continue from last sequence number (used in the recursive call)
last_sequence = response.json()["last_sequence"]
# Recursive call, use last sequence number of the
# event list, and continue with sequence + 1 and batch +1
self.fetch_data(
sequence=last_sequence + 1,
batch=batch + 1
)
import math
# Be sure that batch is never 0, otherwise this will create a file for batch #0 only.
def open_file(self, batch):
# If you're unsure, this handles it for you.
if batch < 1:
batch = 1
filename = f"data_{math.ceil(batch/1000)}.json"
f = open(filename, "a")
return f
为什么不直接使用范围函数?
`
if batch in range(1,1000):
filename = f"data_{batch_name1}.json"
elif batch in range(1000,2000):
filename = f"data_{batch_name2}.json"
elif batch in range(2000,3000):
filename = f"data_{batch_name3}.json"
`
我有 n 个批次,每个批次包含 100 个 API 个请求。第 1 批包含文件 [1-100],第 2 批包含文件 [101-200] 等...
我想将其中的每一个转储到一个 json 文件中。这很好.. 但是,我想将 100k json 响应转储到 1 个文件中,然后创建一个新文件并将接下来的 100k 观察值转储到另一个文件中。
我需要配置一个根据批号创建文件的功能,我尝试了以下方法:
def open_file(self, batch):
if batch % 1000 == 0:
filename = f"data_{batch}.json"
else:
filename = ""
f = open(filename, "a")
return f
如果 batch % 1000 == 0,那么我想更改名称(因为批号 1000 -> 1000 批 100 json 请求 = 总共 100k)。但是,这显然不起作用,因为当我评估批次 1001 时,旧文件再次打开。如何为批次 1-999 创建一个文件,为批次 1000-1999 创建另一个文件,然后为 2000-2999 创建另一个文件....?
谢谢
编辑:附加信息
def fetch_data(self, sequence, batch=1):
# fetch event list (event list size = 100)
event_list = self.events(sequence)
# open files to store data
f = self.open_file(batch)
# opening thread pool executor for multi-threading
with ThreadPoolExecutor(max_workers=4) as executor:
# self.thread_event_list multi-threads each event in event list by sending
# and API request.
for i, response in enumerate(executor.map(self.thread_event_list, event_list), 1):
json.dump(response.json(), f)
f.write("\n")
# continue from last sequence number (used in the recursive call)
last_sequence = response.json()["last_sequence"]
# Recursive call, use last sequence number of the
# event list, and continue with sequence + 1 and batch +1
self.fetch_data(
sequence=last_sequence + 1,
batch=batch + 1
)
import math
# Be sure that batch is never 0, otherwise this will create a file for batch #0 only.
def open_file(self, batch):
# If you're unsure, this handles it for you.
if batch < 1:
batch = 1
filename = f"data_{math.ceil(batch/1000)}.json"
f = open(filename, "a")
return f
为什么不直接使用范围函数? `
if batch in range(1,1000):
filename = f"data_{batch_name1}.json"
elif batch in range(1000,2000):
filename = f"data_{batch_name2}.json"
elif batch in range(2000,3000):
filename = f"data_{batch_name3}.json"
`