在 Pandas DataFrame 中加载 .txt 文件，文本之间有分隔线。

Question

我有一个包含如下文本的文本文件：

--------------------------------
I hate apples and love oranges.
He likes to ride bike.
--------------------------------

--------------------------------
He is a man of honour. 
She loves to travel.
--------------------------------

我想在 pandas 数据框中加载此 txt 文件，每行仅包含分隔符之间的内容。例如：

第 1 行应如下所示：我讨厌苹果，喜欢橘子。他喜欢骑自行车。

第 2 行应该是这样的：他是一个有尊严的人。她喜欢旅行。

Answer 1

看来您需要对文本进行预处理。

尝试：

import pandas as pd
res = []
temp = []
with open(filename) as infile:
    for line in infile:
        val = line.strip()
        if val:        
            if not val.startswith("-"):
                temp.append(val)
            else:
                if temp:
                    res.append(" ".join(temp))
                    temp = []

df = pd.DataFrame(res, columns=["Test"])
print(df)

输出：

                                                Test
0  I hate apples and love oranges. He likes to ri...
1        He is a man of honour. She loves to travel.

在 Pandas DataFrame 中加载 .txt 文件，文本之间有分隔线。

Load .txt files in Pandas DataFrame with separator line in between text.

python

text

separator

dataframe

pandas