如何根据 Python 上的时间戳正确地将 df 中的行附加到另一个? Pandas 相关

How to correctly append rows from a df to another based on timestamps on Python? Pandas related

我目前有 253 个 csv 文件,文件名如下:

1. ALICEUSDT-1h-2021-06-01.csv
2. ALICEUSDT-1h-2021-06-02.csv
3. ALICEUSDT-1h-2021-06-03.csv
4. ALICEUSDT-1h-2021-06-06.csv
5. ALICEUSDT-1h-2021-06-09.csv
6. ALICEUSDT-1h-2021-06-11.csv
7. ALICEUSDT-1h-2021-06-12.csv
.
.
.
253. ALICEUSDT-1h-2022-02-13.csv

位于此路径:C:\Users\StoreX\AliceUSDT-Mark_Prices_Klines_1h_Timeframe

这些文件中的每一个都包含特定资产的每小时价格行为数据,总共有 24 行(没有列名也没有索引

这些数据帧中的任何一个看起来像这样:

    1623369600000   5.53548967  5.8         5.4473946   5.4719309   0   1623373199999   0   3600    0   0   0
    1623373200000   5.47278238  5.53195431  5.37178569  5.41425374  0   1623376799999   0   3600    0   0   0
    1623376800000   5.41425372  5.439       5.30612058  5.43896918  0   1623380399999   0   3597    0   0   0
    1623380400000   5.439       5.573       5.439       5.51534579  0   1623383999999   0   3600    0   0   0
    1623384000000   5.51534611  5.75008683  5.46536508  5.699       0   1623387599999   0   3600    0   0   0
    1623387600000   5.699       5.7319193   5.59313511  5.63882188  0   1623391199999   0   3582    0   0   0
    1623391200000   5.63892904  5.65847764  5.50099143  5.52530541  0   1623394799999   0   3600    0   0   0
    1623394800000   5.52201075  5.69104913  5.49124857  5.658       0   1623398399999   0   3600    0   0   0
    1623398400000   5.6554082   6.336       5.56700781  6.2605508   0   1623401999999   0   3600    0   0   0
    1623402000000   6.26056029  6.57769402  6.26056029  6.392       0   1623405599999   0   3600    0   0   0
    1623405600000   6.3935684   6.68259312  6.30951189  6.582       0   1623409199999   0   3600    0   0   0
    1623409200000   6.58        6.633       6.41970848  6.55150417  0   1623412799999   0   3600    0   0   0
    1623412800000   6.55051699  6.56212633  6.34272711  6.44329682  0   1623416399999   0   3600    0   0   0
    1623416400000   6.43806578  6.63709121  6.40735932  6.49129756  0   1623419999999   0   3600    0   0   0
    1623420000000   6.49097939  6.498       6.19692729  6.25924944  0   1623423599999   0   3600    0   0   0
    1623423600000   6.25924944  6.2822904   6.05089683  6.13788357  0   1623427199999   0   3600    0   0   0
    1623427200000   6.13764793  6.228       6.005       6.015432    0   1623430799999   0   3600    0   0   0
    1623430800000   6.015       6.198       5.97772068  6.19544081  0   1623434399999   0   3600    0   0   0
    1623434400000   6.198       6.349       6.13209062  6.2308081   0   1623437999999   0   3600    0   0   0
    1623438000000   6.23083904  6.42623752  6.201       6.27776139  0   1623441599999   0   3600    0   0   0
    1623441600000   6.27776137  6.31281154  6.02227423  6.036       0   1623445199999   0   3600    0   0   0
    1623445200000   6.033       6.185       5.90345962  6.119       0   1623448799999   0   3600    0   0   0
    1623448800000   6.119       6.298       6.047       6.20634564  0   1623452399999   0   3600    0   0   0
    1623452400000   6.20606916  6.20606916  5.9779765   6.05824449  0   1623455999999   0   3600    0   0   0

The Unix Timestamp is used to set the exact date at which the price was opened (2nd column), and also to set the exact date at which it was closed (5th column), the columns with 0 and ≈3600 values do not matter.

The 2nd column corresponds to the Open price

The 3rd column corresponds to the High price

The 4th column corresponds to the Low price

The 5th column corresponds to the Close price

我想导出一个包含这些文件中所有行的大 df,为此,我编写了以下代码:

import pandas as pd
import os

def check_path(infile):
    return os.path.exists(infile)   

first_entry = input('Tell me the path where your csv files are located at: ')

while True:
    
    if check_path(first_entry) == False:
        print('\n')
        print('This PATH is invalid!')
        first_entry = input('Tell me the RIGHT PATH in which your csv files are located: ')
        
    elif check_path(first_entry) == True:
        print('\n')
        final_output = first_entry
        break

big_df = pd.DataFrame() 
for name in os.listdir(first_entry):
    if name.endswith(".csv"):
        print(name)
        current_df_path = first_entry+"\"+name
        df = pd.read_csv(current_df_path, header=None)
        big_df.append(df)
        print(big_df)
print("")        
print("the final df is the following")
print("")
print(big_df)

但是在运行之后完全没有用,输出如下:

.

.

.

ALICEUSDT-1h-2022-02-12.csv Empty DataFrame Columns: [] Index: [] ALICEUSDT-1h-2022-02-13.csv Empty DataFrame Columns: [] Index: []

the final df is the following

Empty DataFrame Columns: [] Index: []

所以,我想知道我做错了什么?为什么 big_df 实际上没有充满信息?

最后,如何改进上面的代码来满足要求?

追加 returns 数据框,因此您需要编写:

big_df = big_df.append(df)

您可以添加 ignore_index=True 来重置索引

此外,read_csv 的默认分隔符是逗号,因此请指出您的文件中使用的逗号,例如 sep='\s+' 如果是空格,'\t' 如果是制表符,...

您可能希望从仅在数据框中加载一个文件开始,以验证文件路径是否正确以及是否一切正常。