将文本文件转换为 pandas 数据帧的最佳方法是什么？

Question

我有一个基本上可以使用的文本文件。

Number|Name|Report
58|John|John is great

John is good

I like John
[Report Ends]

并为不同的人一遍又一遍地重复。

我想把它变成如下所示的数据框

Number Name Report
58     John John is great John is good I like John [Report Ends]

使用线路 pd.read_csv('/Path', sep="|",header=0) 我得到了正确的列名。第一行直到“报告”部分都是正确的。我认为“报告”部分搞砸了一切，因为它占据了文本文件中的几行。我应该如何将报告数据放入数据框中？

Answer 1

通过几行手动解析，您可以提取信息并在将其读入数据帧之前对其进行调整。

import pandas as pd
with open('info.txt', 'r') as fp:
    info = fp.readlines()
df_dicts = []
cd = None
for line in info[1:]:
    line = line.replace('\n', ' ').strip()
    if '|' in line:
        cd = {}
        df_dicts.append(cd)
        cd['Number'], cd['Name'], cd['Report'] = line.split('|')
    else:
        cd['Report'] += " " + line

print(pd.DataFrame(df_dicts))

如果您对替换函数过于笼统有疑问，则必须开始研究正则表达式。

将文本文件转换为 pandas 数据帧的最佳方法是什么？

What would be the best way to convert a text file to a pandas dataframe?

python

dataframe

pandas

txt