使用 for line in 和 .append in 循环重建、填充 .csv 中的空白

Reconstructing, filling the gaps in .csv using for line in, and .append in loop

请帮帮我,我累了。 看不出为什么如何让它发挥作用。

要解决的问题: .csv 文件必须包含 1sec 数据,例如:

time,open,high,low,close,Extremum,Fib 1,Fib 2,Fib 3,l100
2022-04-03 02:00:00,3.294,3.294,3.294,3.294,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:04,3.294,3.295,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:05,3.293,3.293,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:07,3.293,3.293,3.293,3.293,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:08,3.293,3.293,3.293,3.293,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:09,3.292,3.292,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634

但它没有。 有些秒不存在,所以我用最后一次看到的数据来做 基本上读一行:

sep = ','    
data = line.split(sep)

data[1] 到 data[9] 保持不变,只有 data[0] 变化 + 1 秒,以填补空白:

time,open,high,low,close,Extremum,Fib 1,Fib 2,Fib 3,l100
2022-04-03 02:00:00,3.294,3.294,3.294,3.294,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:01,3.294,3.294,3.294,3.294,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:02,3.294,3.294,3.294,3.294,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:03,3.294,3.294,3.294,3.294,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:04,3.294,3.295,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:05,3.293,3.293,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:06,3.293,3.293,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:07,3.293,3.293,3.293,3.293,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:08,3.293,3.293,3.293,3.293,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:09,3.292,3.292,3.292,3.292,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634

我为此制定了逻辑,只是 .append 让我变了把戏.. 输出是针对源 csv 文件中的每一行,它在输出 csv 文件中产生相同数量的行,但所有记录都具有相同的源文件的最后一行,f, :

time,open,high,low,close,Extremum,Fib 1,Fib 2,Fib 3,l100
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634
2022-04-03 02:00:18,3.289,3.289,3.289,3.289,3.277,3.332898006846162,3.348093581522788,3.357138566449352,3.367449849265634

为什么, 这是代码:

import glob
import datetime
import time
import pandas as pd

# make files sync on sec's

filenames = [i for i in glob.glob("*unique_sorted.csv")]

for filename in filenames:
    coin_name = filename[0:18] 
    print(filename)
    with open (filename, "r") as f:                     
        x = -1
        memory = {}
        memory["data"] = {}
        memory["data"]["time"] = {}
        
        new_file = []
        tolist = {}
        tolist["total"] = {}
        tolist["memory"] = {}
        sep = ','
        start = 1
        for line in f:
            data = line.split(sep)
            if data[0] != "time":
                x = x + 1
                if start == 0 and data[0][0:19] != sec_1_more:
                    memory_time = datetime.datetime.strptime(tolist["memory"]['time'], "%Y-%m-%d %H:%M:%S") # data[0] from previous line
                    read_line_time = datetime.datetime.strptime(data[0][0:19], "%Y-%m-%d %H:%M:%S") # current_line
                    diff = read_line_time - sec_1_more
                    diff_sec = diff.total_seconds()
                    sec = int(diff_sec)
                    i = 1
                    while i < sec:
                        time_for_same_data = memory_time + datetime.timedelta(seconds=1) # 2:00:00 + 1 second
                        time_for_same_data_str = str(time_for_same_data)
                        #2022-04-03 02:00:04,1.4073,1.4073,1.4071,1.4072,1.375,1.4137077251573131,1.4242302135495926,1.4304935994973778,1.437633859477853
                        tolist["total"]['time'] = time_for_same_data_str
                        tolist["total"]['open'] = data[1]
                        tolist["total"]['high'] = data[2]
                        tolist["total"]['low'] = data[3]
                        tolist["total"]['close'] = data[4]
                        tolist["total"]['Extremum'] = data[5]
                        tolist["total"]['Fib 1'] = data[6]
                        tolist["total"]['Fib 2'] = data[7]
                        tolist["total"]['Fib 3'] = data[8]
                        tolist["total"]['l100'] = data[9].strip()
                        new_file.append(tolist["total"])
                        memory_time = memory_time + datetime.timedelta(seconds=1)
                        i = i + 1
                    tolist["total"]['time'] = data[0]
                    tolist["total"]['open'] = data[1]
                    tolist["total"]['high'] = data[2]
                    tolist["total"]['low'] = data[3]
                    tolist["total"]['close'] = data[4]
                    tolist["total"]['Extremum'] = data[5]
                    tolist["total"]['Fib 1'] = data[6]
                    tolist["total"]['Fib 2'] = data[7]
                    tolist["total"]['Fib 3'] = data[8]
                    tolist["total"]['l100'] = data[9].strip()
                    new_file.append(tolist["total"])
                    
                elif start == 0 and data[0][0:19] == sec_1_more:
                    tolist["total"]['time'] = data[0]
                    tolist["total"]['open'] = data[1]
                    tolist["total"]['high'] = data[2]
                    tolist["total"]['low'] = data[3]
                    tolist["total"]['close'] = data[4]
                    tolist["total"]['Extremum'] = data[5]
                    tolist["total"]['Fib 1'] = data[6]
                    tolist["total"]['Fib 2'] = data[7]
                    tolist["total"]['Fib 3'] = data[8]
                    tolist["total"]['l100'] = data[9].strip()
                    new_file.append(tolist["total"])
                    

                memory["data"]["data"] = str(line)
                memory["data"]["time"] = str(data[0][0:19]) 
                
                tolist["memory"]['time'] = data[0]
                tolist["memory"]['open'] = data[1]
                tolist["memory"]['high'] = data[2]
                tolist["memory"]['low'] = data[3]
                tolist["memory"]['close'] = data[4]
                tolist["memory"]['Extremum'] = data[5]
                tolist["memory"]['Fib 1'] = data[6]
                tolist["memory"]['Fib 2'] = data[7]
                tolist["memory"]['Fib 3'] = data[8]
                tolist["memory"]['l100'] = data[9].strip()
                
                #t = "2022-04-03 02:00:04"
                t = datetime.datetime.strptime(memory["data"]["time"], "%Y-%m-%d %H:%M:%S")
                #sec_1_more = (t + datetime.timedelta(seconds=1)).strftime("%Y-%m-%d %H:%M:%S")
                #or
                sec_1_more = t + datetime.timedelta(seconds=1)
                
                if start == 1:
                    new_file.append(tolist["memory"])
                
                start = 0
                
                if x == 10:
                    #print(new_file)
                    #quit()
                    break # for test to see only 10 first
    f.close() # needed      
    csvData = pd.DataFrame(new_file)
    csvData.to_csv(coin_name+"_unique_sorted_synced.csv", mode="w", index=False)                            
    quit() # coz just one file processing for testing

看来您可以只读入数据,然后使用 asfreq:


# instead of read_clipboard, you'd read it with pd.read_csv
df = pd.read_clipboard(sep=',', parse_dates = ['time']) 

df.set_index('time').asfreq(freq='1S').ffill()

                      open   high    low  close  Extremum     Fib 1     Fib 2     Fib 3     l100
time
2022-04-03 02:00:00  3.294  3.294  3.294  3.294     3.277  3.332898  3.348094  3.357139  3.36745
2022-04-03 02:00:01  3.294  3.294  3.294  3.294     3.277  3.332898  3.348094  3.357139  3.36745
2022-04-03 02:00:02  3.294  3.294  3.294  3.294     3.277  3.332898  3.348094  3.357139  3.36745
2022-04-03 02:00:03  3.294  3.294  3.294  3.294     3.277  3.332898  3.348094  3.357139  3.36745
2022-04-03 02:00:04  3.294  3.295  3.292  3.292     3.277  3.332898  3.348094  3.357139  3.36745
2022-04-03 02:00:05  3.293  3.293  3.292  3.292     3.277  3.332898  3.348094  3.357139  3.36745
2022-04-03 02:00:06  3.293  3.293  3.292  3.292     3.277  3.332898  3.348094  3.357139  3.36745
2022-04-03 02:00:07  3.293  3.293  3.293  3.293     3.277  3.332898  3.348094  3.357139  3.36745
2022-04-03 02:00:08  3.293  3.293  3.293  3.293     3.277  3.332898  3.348094  3.357139  3.36745
2022-04-03 02:00:09  3.292  3.292  3.292  3.292     3.277  3.332898  3.348094  3.357139  3.36745