使用 for line in 和 .append in 循环重建、填充 .csv 中的空白
Reconstructing, filling the gaps in .csv using for line in, and .append in loop
请帮帮我,我累了。
看不出为什么如何让它发挥作用。
要解决的问题:
.csv 文件必须包含 1sec 数据,例如:
time,open,high,low,close,Extremum,Fib 1,Fib 2,Fib 3,l100
2022-04-03 02:00:00,3.294,3.294,3.294,3.294,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:04,3.294,3.295,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:05,3.293,3.293,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:07,3.293,3.293,3.293,3.293,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:08,3.293,3.293,3.293,3.293,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:09,3.292,3.292,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
但它没有。
有些秒不存在,所以我用最后一次看到的数据来做
基本上读一行:
sep = ','
data = line.split(sep)
data[1] 到 data[9] 保持不变,只有 data[0] 变化 + 1 秒,以填补空白:
time,open,high,low,close,Extremum,Fib 1,Fib 2,Fib 3,l100
2022-04-03 02:00:00,3.294,3.294,3.294,3.294,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:01,3.294,3.294,3.294,3.294,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:02,3.294,3.294,3.294,3.294,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:03,3.294,3.294,3.294,3.294,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:04,3.294,3.295,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:05,3.293,3.293,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:06,3.293,3.293,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:07,3.293,3.293,3.293,3.293,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:08,3.293,3.293,3.293,3.293,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:09,3.292,3.292,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
我为此制定了逻辑,只是 .append 让我变了把戏..
输出是针对源 csv 文件中的每一行,它在输出 csv 文件中产生相同数量的行,但所有记录都具有相同的源文件的最后一行,f, :
time,open,high,low,close,Extremum,Fib 1,Fib 2,Fib 3,l100
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
为什么,
这是代码:
import glob
import datetime
import time
import pandas as pd
# make files sync on sec's
filenames = [i for i in glob.glob("*unique_sorted.csv")]
for filename in filenames:
coin_name = filename[0:18]
print(filename)
with open (filename, "r") as f:
x = -1
memory = {}
memory["data"] = {}
memory["data"]["time"] = {}
new_file = []
tolist = {}
tolist["total"] = {}
tolist["memory"] = {}
sep = ','
start = 1
for line in f:
data = line.split(sep)
if data[0] != "time":
x = x + 1
if start == 0 and data[0][0:19] != sec_1_more:
memory_time = datetime.datetime.strptime(tolist["memory"]['time'], "%Y-%m-%d %H:%M:%S") # data[0] from previous line
read_line_time = datetime.datetime.strptime(data[0][0:19], "%Y-%m-%d %H:%M:%S") # current_line
diff = read_line_time - sec_1_more
diff_sec = diff.total_seconds()
sec = int(diff_sec)
i = 1
while i < sec:
time_for_same_data = memory_time + datetime.timedelta(seconds=1) # 2:00:00 + 1 second
time_for_same_data_str = str(time_for_same_data)
#2022-04-03 02:00:04,1.4073,1.4073,1.4071,1.4072,1.375,1.4137077251573131,1.4242302135495926,1.4304935994973778,1.437633859477853
tolist["total"]['time'] = time_for_same_data_str
tolist["total"]['open'] = data[1]
tolist["total"]['high'] = data[2]
tolist["total"]['low'] = data[3]
tolist["total"]['close'] = data[4]
tolist["total"]['Extremum'] = data[5]
tolist["total"]['Fib 1'] = data[6]
tolist["total"]['Fib 2'] = data[7]
tolist["total"]['Fib 3'] = data[8]
tolist["total"]['l100'] = data[9].strip()
new_file.append(tolist["total"])
memory_time = memory_time + datetime.timedelta(seconds=1)
i = i + 1
tolist["total"]['time'] = data[0]
tolist["total"]['open'] = data[1]
tolist["total"]['high'] = data[2]
tolist["total"]['low'] = data[3]
tolist["total"]['close'] = data[4]
tolist["total"]['Extremum'] = data[5]
tolist["total"]['Fib 1'] = data[6]
tolist["total"]['Fib 2'] = data[7]
tolist["total"]['Fib 3'] = data[8]
tolist["total"]['l100'] = data[9].strip()
new_file.append(tolist["total"])
elif start == 0 and data[0][0:19] == sec_1_more:
tolist["total"]['time'] = data[0]
tolist["total"]['open'] = data[1]
tolist["total"]['high'] = data[2]
tolist["total"]['low'] = data[3]
tolist["total"]['close'] = data[4]
tolist["total"]['Extremum'] = data[5]
tolist["total"]['Fib 1'] = data[6]
tolist["total"]['Fib 2'] = data[7]
tolist["total"]['Fib 3'] = data[8]
tolist["total"]['l100'] = data[9].strip()
new_file.append(tolist["total"])
memory["data"]["data"] = str(line)
memory["data"]["time"] = str(data[0][0:19])
tolist["memory"]['time'] = data[0]
tolist["memory"]['open'] = data[1]
tolist["memory"]['high'] = data[2]
tolist["memory"]['low'] = data[3]
tolist["memory"]['close'] = data[4]
tolist["memory"]['Extremum'] = data[5]
tolist["memory"]['Fib 1'] = data[6]
tolist["memory"]['Fib 2'] = data[7]
tolist["memory"]['Fib 3'] = data[8]
tolist["memory"]['l100'] = data[9].strip()
#t = "2022-04-03 02:00:04"
t = datetime.datetime.strptime(memory["data"]["time"], "%Y-%m-%d %H:%M:%S")
#sec_1_more = (t + datetime.timedelta(seconds=1)).strftime("%Y-%m-%d %H:%M:%S")
#or
sec_1_more = t + datetime.timedelta(seconds=1)
if start == 1:
new_file.append(tolist["memory"])
start = 0
if x == 10:
#print(new_file)
#quit()
break # for test to see only 10 first
f.close() # needed
csvData = pd.DataFrame(new_file)
csvData.to_csv(coin_name+"_unique_sorted_synced.csv", mode="w", index=False)
quit() # coz just one file processing for testing
看来您可以只读入数据,然后使用 asfreq
:
# instead of read_clipboard, you'd read it with pd.read_csv
df = pd.read_clipboard(sep=',', parse_dates = ['time'])
df.set_index('time').asfreq(freq='1S').ffill()
open high low close Extremum Fib 1 Fib 2 Fib 3 l100
time
2022-04-03 02:00:00 3.294 3.294 3.294 3.294 3.277 3.332898 3.348094 3.357139 3.36745
2022-04-03 02:00:01 3.294 3.294 3.294 3.294 3.277 3.332898 3.348094 3.357139 3.36745
2022-04-03 02:00:02 3.294 3.294 3.294 3.294 3.277 3.332898 3.348094 3.357139 3.36745
2022-04-03 02:00:03 3.294 3.294 3.294 3.294 3.277 3.332898 3.348094 3.357139 3.36745
2022-04-03 02:00:04 3.294 3.295 3.292 3.292 3.277 3.332898 3.348094 3.357139 3.36745
2022-04-03 02:00:05 3.293 3.293 3.292 3.292 3.277 3.332898 3.348094 3.357139 3.36745
2022-04-03 02:00:06 3.293 3.293 3.292 3.292 3.277 3.332898 3.348094 3.357139 3.36745
2022-04-03 02:00:07 3.293 3.293 3.293 3.293 3.277 3.332898 3.348094 3.357139 3.36745
2022-04-03 02:00:08 3.293 3.293 3.293 3.293 3.277 3.332898 3.348094 3.357139 3.36745
2022-04-03 02:00:09 3.292 3.292 3.292 3.292 3.277 3.332898 3.348094 3.357139 3.36745
请帮帮我,我累了。 看不出为什么如何让它发挥作用。
要解决的问题: .csv 文件必须包含 1sec 数据,例如:
time,open,high,low,close,Extremum,Fib 1,Fib 2,Fib 3,l100
2022-04-03 02:00:00,3.294,3.294,3.294,3.294,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:04,3.294,3.295,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:05,3.293,3.293,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:07,3.293,3.293,3.293,3.293,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:08,3.293,3.293,3.293,3.293,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:09,3.292,3.292,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
但它没有。 有些秒不存在,所以我用最后一次看到的数据来做 基本上读一行:
sep = ','
data = line.split(sep)
data[1] 到 data[9] 保持不变,只有 data[0] 变化 + 1 秒,以填补空白:
time,open,high,low,close,Extremum,Fib 1,Fib 2,Fib 3,l100
2022-04-03 02:00:00,3.294,3.294,3.294,3.294,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:01,3.294,3.294,3.294,3.294,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:02,3.294,3.294,3.294,3.294,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:03,3.294,3.294,3.294,3.294,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:04,3.294,3.295,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:05,3.293,3.293,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:06,3.293,3.293,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:07,3.293,3.293,3.293,3.293,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:08,3.293,3.293,3.293,3.293,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:09,3.292,3.292,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
我为此制定了逻辑,只是 .append 让我变了把戏.. 输出是针对源 csv 文件中的每一行,它在输出 csv 文件中产生相同数量的行,但所有记录都具有相同的源文件的最后一行,f, :
time,open,high,low,close,Extremum,Fib 1,Fib 2,Fib 3,l100
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
为什么, 这是代码:
import glob
import datetime
import time
import pandas as pd
# make files sync on sec's
filenames = [i for i in glob.glob("*unique_sorted.csv")]
for filename in filenames:
coin_name = filename[0:18]
print(filename)
with open (filename, "r") as f:
x = -1
memory = {}
memory["data"] = {}
memory["data"]["time"] = {}
new_file = []
tolist = {}
tolist["total"] = {}
tolist["memory"] = {}
sep = ','
start = 1
for line in f:
data = line.split(sep)
if data[0] != "time":
x = x + 1
if start == 0 and data[0][0:19] != sec_1_more:
memory_time = datetime.datetime.strptime(tolist["memory"]['time'], "%Y-%m-%d %H:%M:%S") # data[0] from previous line
read_line_time = datetime.datetime.strptime(data[0][0:19], "%Y-%m-%d %H:%M:%S") # current_line
diff = read_line_time - sec_1_more
diff_sec = diff.total_seconds()
sec = int(diff_sec)
i = 1
while i < sec:
time_for_same_data = memory_time + datetime.timedelta(seconds=1) # 2:00:00 + 1 second
time_for_same_data_str = str(time_for_same_data)
#2022-04-03 02:00:04,1.4073,1.4073,1.4071,1.4072,1.375,1.4137077251573131,1.4242302135495926,1.4304935994973778,1.437633859477853
tolist["total"]['time'] = time_for_same_data_str
tolist["total"]['open'] = data[1]
tolist["total"]['high'] = data[2]
tolist["total"]['low'] = data[3]
tolist["total"]['close'] = data[4]
tolist["total"]['Extremum'] = data[5]
tolist["total"]['Fib 1'] = data[6]
tolist["total"]['Fib 2'] = data[7]
tolist["total"]['Fib 3'] = data[8]
tolist["total"]['l100'] = data[9].strip()
new_file.append(tolist["total"])
memory_time = memory_time + datetime.timedelta(seconds=1)
i = i + 1
tolist["total"]['time'] = data[0]
tolist["total"]['open'] = data[1]
tolist["total"]['high'] = data[2]
tolist["total"]['low'] = data[3]
tolist["total"]['close'] = data[4]
tolist["total"]['Extremum'] = data[5]
tolist["total"]['Fib 1'] = data[6]
tolist["total"]['Fib 2'] = data[7]
tolist["total"]['Fib 3'] = data[8]
tolist["total"]['l100'] = data[9].strip()
new_file.append(tolist["total"])
elif start == 0 and data[0][0:19] == sec_1_more:
tolist["total"]['time'] = data[0]
tolist["total"]['open'] = data[1]
tolist["total"]['high'] = data[2]
tolist["total"]['low'] = data[3]
tolist["total"]['close'] = data[4]
tolist["total"]['Extremum'] = data[5]
tolist["total"]['Fib 1'] = data[6]
tolist["total"]['Fib 2'] = data[7]
tolist["total"]['Fib 3'] = data[8]
tolist["total"]['l100'] = data[9].strip()
new_file.append(tolist["total"])
memory["data"]["data"] = str(line)
memory["data"]["time"] = str(data[0][0:19])
tolist["memory"]['time'] = data[0]
tolist["memory"]['open'] = data[1]
tolist["memory"]['high'] = data[2]
tolist["memory"]['low'] = data[3]
tolist["memory"]['close'] = data[4]
tolist["memory"]['Extremum'] = data[5]
tolist["memory"]['Fib 1'] = data[6]
tolist["memory"]['Fib 2'] = data[7]
tolist["memory"]['Fib 3'] = data[8]
tolist["memory"]['l100'] = data[9].strip()
#t = "2022-04-03 02:00:04"
t = datetime.datetime.strptime(memory["data"]["time"], "%Y-%m-%d %H:%M:%S")
#sec_1_more = (t + datetime.timedelta(seconds=1)).strftime("%Y-%m-%d %H:%M:%S")
#or
sec_1_more = t + datetime.timedelta(seconds=1)
if start == 1:
new_file.append(tolist["memory"])
start = 0
if x == 10:
#print(new_file)
#quit()
break # for test to see only 10 first
f.close() # needed
csvData = pd.DataFrame(new_file)
csvData.to_csv(coin_name+"_unique_sorted_synced.csv", mode="w", index=False)
quit() # coz just one file processing for testing
看来您可以只读入数据,然后使用 asfreq
:
# instead of read_clipboard, you'd read it with pd.read_csv
df = pd.read_clipboard(sep=',', parse_dates = ['time'])
df.set_index('time').asfreq(freq='1S').ffill()
open high low close Extremum Fib 1 Fib 2 Fib 3 l100
time
2022-04-03 02:00:00 3.294 3.294 3.294 3.294 3.277 3.332898 3.348094 3.357139 3.36745
2022-04-03 02:00:01 3.294 3.294 3.294 3.294 3.277 3.332898 3.348094 3.357139 3.36745
2022-04-03 02:00:02 3.294 3.294 3.294 3.294 3.277 3.332898 3.348094 3.357139 3.36745
2022-04-03 02:00:03 3.294 3.294 3.294 3.294 3.277 3.332898 3.348094 3.357139 3.36745
2022-04-03 02:00:04 3.294 3.295 3.292 3.292 3.277 3.332898 3.348094 3.357139 3.36745
2022-04-03 02:00:05 3.293 3.293 3.292 3.292 3.277 3.332898 3.348094 3.357139 3.36745
2022-04-03 02:00:06 3.293 3.293 3.292 3.292 3.277 3.332898 3.348094 3.357139 3.36745
2022-04-03 02:00:07 3.293 3.293 3.293 3.293 3.277 3.332898 3.348094 3.357139 3.36745
2022-04-03 02:00:08 3.293 3.293 3.293 3.293 3.277 3.332898 3.348094 3.357139 3.36745
2022-04-03 02:00:09 3.292 3.292 3.292 3.292 3.277 3.332898 3.348094 3.357139 3.36745