如何使用 pandas 读写混合元数据数据文件

Question

科学数据通常在数据部分之前带有元数据部分。我想像下面的示例一样读取 CSV 文件，其中我将前 5 行作为元数据 'header' 分开，并对其余部分进行计算：

来源：whosebug.com

引用：Whosebug 等。 2021：如何使用 pandas.

导入和导出混合元数据 - 数据文件

日期: 17.02.21

col_1	col_2	col_3
a	0	3
b	1	9
c	4	-2

完成后，我想在顶部写入带有元数据 'header' 的数据集，以保持原始文件结构。

来源：whosebug.com

引用：Whosebug 等。 2021：如何使用 pandas.

导入和导出混合元数据 - 数据文件

日期: 17.02.21

col_1	col_2	col_3	col_4
a	0	3	3
b	1	9	10
c	4	-2	2

Answer 1

不确定为什么要转义换行符，所以我在示例数据中删除了

打开文件并读取内容
取前五行作为meta header信息
进行 DF 操作
将结果保存回文件。先写元数据再写DF内容

from pathlib import Path

filetext = """Source: whosebug.com
Citation: Whosebug et al. 2021: How to import and export mixed metadata - data files using pandas.
Date: 17.02.21
,,,
,,,
col_1,col_2,col_3
a,0,3
b,1,9
c,4,-2"""

p = Path.cwd().joinpath("so_science.txt")
with open(p, "w") as f:
    f.write(filetext)

# get file contents
with open(p, "r") as f: fc = f.read()
        
# first five rows are metadata
header = "\n".join(fc.split("\n")[:5])
# reset is a CSV
df = pd.read_csv(io.StringIO("\n".join(fc.split("\n")[5:])))
# modify DF
df["col_2"] = df["col_2"] + df["col_3"]

# write out meta-data and CSV
with open(p, "w") as f:
    f.write(f"{header}\n")
    df.to_csv(f, index=False)

如何使用 pandas 读写混合元数据数据文件

How to read and write mixed metadata-data files using pandas

python

metadata

dataset

pandas