将 CSV 文件的注释行保留在 pandas 中？

Question

我刚刚开始研究 Pandas 的世界，我发现的第一个奇怪的 CSV 文件是在开头有两行注释（具有不同的列宽）。

sometext, sometext2
moretext, moretext1, moretext2
*header*
actual data ---
---------------

我知道如何使用 skiprows 或 header= 跳过这些行，但是，相反，我如何在使用 read_csv 时保留这些注释？有时注释作为文件元信息是必要的，我不想把它们扔掉。

Answer 1

Pandas专为读取结构化数据而设计。

对于非结构化数据，只需使用内置 open:

with open('file.csv') as f:
    reader = csv.reader(f)
    row1 = next(reader)  # gets the first line
    row2 = next(reader)  # gets the second line

您可以像这样将字符串附加到数据框：

df.comments = 'My Comments'

But note:

Note, however, that while you can attach attributes to a DataFrame, operations performed on the DataFrame (such as groupby, pivot, join or loc to name just a few) may return a new DataFrame without the metadata attached. Pandas does not yet have a robust method of propagating metadata attached to DataFrames.

Answer 2

您可以先读取元数据，然后再使用 read_csv:

with open('f.csv') as file:
    #read first 2 rows to metadata
    header = [file.readline() for x in range(2)]
    meta = [value.strip().split(',') for value in header]
    print (meta)
    [['sometext', ' sometext2'], ['moretext', ' moretext1', ' moretext2']]

    df = pd.read_csv(file)
    print (df)

          *header*
    0  actual data

将 CSV 文件的注释行保留在 pandas 中？

Keep CSV file's comment lines in pandas?

python

pandas

csv

import

comments