将 CSV 文件的注释行保留在 pandas 中?

Keep CSV file's comment lines in pandas?

我刚刚开始研究 Pandas 的世界,我发现的第一个奇怪的 CSV 文件是在开头有两行注释(具有不同的列宽)。

sometext, sometext2
moretext, moretext1, moretext2
*header*
actual data ---
---------------

我知道如何使用 skiprowsheader= 跳过这些行,但是,相反,我如何在使用 read_csv 时保留这些注释?有时注释作为文件元信息是必要的,我不想把它们扔掉。

Pandas专为读取结构化数据而设计。

对于非结构化数据,只需使用内置 open:

with open('file.csv') as f:
    reader = csv.reader(f)
    row1 = next(reader)  # gets the first line
    row2 = next(reader)  # gets the second line

您可以像这样将字符串附加到数据框:

df.comments = 'My Comments'

But note:

Note, however, that while you can attach attributes to a DataFrame, operations performed on the DataFrame (such as groupby, pivot, join or loc to name just a few) may return a new DataFrame without the metadata attached. Pandas does not yet have a robust method of propagating metadata attached to DataFrames.

您可以先读取元数据,然后再使用 read_csv:

with open('f.csv') as file:
    #read first 2 rows to metadata
    header = [file.readline() for x in range(2)]
    meta = [value.strip().split(',') for value in header]
    print (meta)
    [['sometext', ' sometext2'], ['moretext', ' moretext1', ' moretext2']]

    df = pd.read_csv(file)
    print (df)

          *header*
    0  actual data