Python 循环 - 存储来自 .txt 文件循环的数据帧，具有不同的长度

Question

我想遍历一堆 .txt 文件，对于处理它的每个文件（删除列、更改名称、nan 等）以获取 df1 的最终数据帧输出，它具有特定的日期、纬度、lon 和分配给它的变量。在循环中，我想获得 df_all，其中包含所有文件中的所有信息（最有可能按日期顺序排列）。

但是，我的每个数据帧的长度都不同，并且它们有可能在该列中共享相同的日期+ lat/lon 值。

我已经编写了代码来分别输入和处理文件，但我仍然不知道如何将其变成一个更大的循环（通过 concat/append...？）。

我试图以一个大数据帧 (df_all) 结束，它包含不同文件的所有 'scattered' 信息（df1 输出）。此外，如果日期和 lat/lon 有冲突，我会求均值。这可以在 python/pandas 中完成吗？

对于多个问题中的任何一个的任何帮助，我们将不胜感激！或者关于如何解决这个问题的想法。

Answer 1

这是假的 table，由 for-loop 和 concat 读入大 table。然后在将所有行添加到单个大 table 后，您可以将 A 列中具有相同值的多行组合在一起，并获得 mean 的 B和 C 列为例。您应该能够自己运行这段代码，我希望这有助于为您提供关键字，用于搜索与您的问题类似的其他问题！

import pandas as pd

#Making fake table read ins. you'd be using pd.read_csv or similar
def fake_read_table(name):
    small_df1 = pd.DataFrame({'A': {0: 5, 1: 1, 2: 3, 3: 1}, 'B': {0: 4, 1: 4, 2: 4, 3: 4}, 'C': {0: 2, 1: 1, 2: 4, 3: 1}})
    small_df2 = pd.DataFrame({'A': {0: 4, 1: 5, 2: 1, 3: 4, 4: 3, 5: 2, 6: 5, 7: 1}, 'B': {0: 3, 1: 1, 2: 1, 3: 1, 4: 5, 5: 1, 6: 4, 7: 2}, 'C': {0: 4, 1: 1, 2: 5, 3: 2, 4: 4, 5: 4, 6: 5, 7: 2}})
    small_df3 = pd.DataFrame({'A': {0: 2, 1: 2, 2: 4, 3: 3, 4: 1, 5: 4, 6: 5}, 'B': {0: 1, 1: 2, 2: 3, 3: 1, 4: 3, 5: 5, 6: 4}, 'C': {0: 5, 1: 2, 2: 3, 3: 3, 4: 5, 5: 4, 6: 5}})
    
    if name == '1.txt':
        return small_df1
    
    if name == '2.txt':
        return small_df2
    
    if name == '3.txt':
        return small_df3


#Start here
txt_paths = ['1.txt','2.txt','3.txt']
    
big_df = pd.DataFrame()

for txt_path in txt_paths:
    small_df = fake_read_table(txt_path)
    
    # .. do some processing you need to do somewhere in here ..

    big_df = pd.concat((big_df,small_df))
    
    
#Taking the average B and C values for rows that have the same A value
agg_df = big_df.groupby('A').agg(
    mean_B = ('B','mean'),
    mean_C = ('C','mean'),
).reset_index()

print(agg_df)

Python 循环 - 存储来自 .txt 文件循环的数据帧，具有不同的长度

Python Looping - storing dataframes from .txt file loop, with different lengths

python

variables

loops

pandas

spyder