处理csv文档中的文本

Question

我开始对一些 csv 文档进行文本分析。然而，我的 csv 文档有几个句子，我不感兴趣，所以我想创建一个 python 代码来分析这个 csv 文档，只留下包含超过 5 个单词的句子供我分析，但是我我不知道从哪里开始编写我的代码并需要一些帮助。

示例：

输入文件 enter image description here

输出文件 enter image description here

Answer 1

这应该有效（Python 3.5）：

lines = []
finalLines = []
toRemove = ['a', 'in', 'the']

with open('export.csv') as f:
    lines.append(f.readlines())

for line in lines:
    temp = list(csv.reader(line))
    sentence = ''
    for word in temp[0][0].split():
        if (word not in toRemove):
            sentence = sentence + ' ' + word
    finalLines.append(sentence.strip())

print(finalLines)

Answer 2

如果您使用 pandas（python 广泛用于数据处理的库），您可以轻松高效地完成工作。这是官方 pandas 文档的 link：

http://pandas.pydata.org/pandas-docs/stable/

注意：Pandas 具有读取 csv 文件的内置函数。您可以使用 'skiprow' 参数来跳过您不想要的行或应用正则表达式来过滤文本。

处理csv文档中的文本

Processing text in csv document

python

csv

text-processing