将 CSV 文件的注释行保留在 pandas 中?
Keep CSV file's comment lines in pandas?
我刚刚开始研究 Pandas 的世界,我发现的第一个奇怪的 CSV 文件是在开头有两行注释(具有不同的列宽)。
sometext, sometext2
moretext, moretext1, moretext2
*header*
actual data ---
---------------
我知道如何使用 skiprows
或 header=
跳过这些行,但是,相反,我如何在使用 read_csv
时保留这些注释?有时注释作为文件元信息是必要的,我不想把它们扔掉。
Pandas专为读取结构化数据而设计。
对于非结构化数据,只需使用内置 open
:
with open('file.csv') as f:
reader = csv.reader(f)
row1 = next(reader) # gets the first line
row2 = next(reader) # gets the second line
您可以像这样将字符串附加到数据框:
df.comments = 'My Comments'
But note:
Note, however, that while you can attach attributes to a DataFrame,
operations performed on the DataFrame (such as groupby, pivot, join or
loc to name just a few) may return a new DataFrame without the
metadata attached. Pandas does not yet have a robust method of
propagating metadata attached to DataFrames.
您可以先读取元数据,然后再使用 read_csv
:
with open('f.csv') as file:
#read first 2 rows to metadata
header = [file.readline() for x in range(2)]
meta = [value.strip().split(',') for value in header]
print (meta)
[['sometext', ' sometext2'], ['moretext', ' moretext1', ' moretext2']]
df = pd.read_csv(file)
print (df)
*header*
0 actual data
我刚刚开始研究 Pandas 的世界,我发现的第一个奇怪的 CSV 文件是在开头有两行注释(具有不同的列宽)。
sometext, sometext2
moretext, moretext1, moretext2
*header*
actual data ---
---------------
我知道如何使用 skiprows
或 header=
跳过这些行,但是,相反,我如何在使用 read_csv
时保留这些注释?有时注释作为文件元信息是必要的,我不想把它们扔掉。
Pandas专为读取结构化数据而设计。
对于非结构化数据,只需使用内置 open
:
with open('file.csv') as f:
reader = csv.reader(f)
row1 = next(reader) # gets the first line
row2 = next(reader) # gets the second line
您可以像这样将字符串附加到数据框:
df.comments = 'My Comments'
But note:
Note, however, that while you can attach attributes to a DataFrame, operations performed on the DataFrame (such as groupby, pivot, join or loc to name just a few) may return a new DataFrame without the metadata attached. Pandas does not yet have a robust method of propagating metadata attached to DataFrames.
您可以先读取元数据,然后再使用 read_csv
:
with open('f.csv') as file:
#read first 2 rows to metadata
header = [file.readline() for x in range(2)]
meta = [value.strip().split(',') for value in header]
print (meta)
[['sometext', ' sometext2'], ['moretext', ' moretext1', ' moretext2']]
df = pd.read_csv(file)
print (df)
*header*
0 actual data