与使用 Python 的 txt 文件中的列表进行比较时，如何从 csv 文件中删除行？

Question

我在 .txt 文件中存储了一个包含 12.000 个字典条目的列表（只有单词，没有定义）。

我有一个完整的字典，其中包含 62.000 个条目（单词及其定义）存储在 .csv 文件中。

我需要将 .txt 文件中的小列表与 .csv 文件中的较大列表进行比较，然后 删除包含以下条目的行 没有出现在较小的列表中。换句话说，我想将这本词典清除到只有 12.000 个条目。

.txt 文件的顺序是这样的，逐行排列：

word1

word2

word3

.csv 文件的顺序如下：

ID（第 1 列）WORD（第 2 列）MEANING（第 3 列）

如何使用 Python 完成此操作？

Answer 1

以下内容无法很好地扩展，但应该适用于指示的记录数。

import csv

csv_in = csv.reader(open(path_to_file, 'r'))
csv_out = csv.writer(open(path_to_file2, 'w'))
use_words = open(path_to_file3, 'r').readlines()

lookup = dict([(word, None) for word in use_words])

for line in csv_in:
    if lookup.has_key(line[0]):
        csv_out.writerow(line)

csv_out.close()

Answer 2

当前计算机中鲜为人知的事实之一是，当您从文本文件中删除一行并保存文件时，编辑器通常会这样做：

将文件加载到内存中
用你想要的行写一个临时文件
关闭文件并将临时文件移到原始文件上

所以你必须加载你的单词列表：

with open('wordlist.txt') as i:
    wordlist = set(word.strip() for word in i)  #  you said the file was small

然后你打开输入文件：

with open('input.csv') as i:
    with open('output.csv', 'w') as o:
        output = csv.writer(o)
        for line in csv.reader(i):  # iterate over the CSV line by line
            if line[1] not in wordlist:  # test the value at column 2, the word
                output.writerow(line) 

os.rename('input.csv', 'output.csv')

这是未经测试的，现在去做你的功课，如果你发现任何错误，请在这里评论......:-)

Answer 3

到目前为止答案很好。如果你想要简约...

import csv

lookup = set(l.strip().lower() for l in open(path_to_file3))
map(csv.writer(open(path_to_file2, 'w')).writerow, 
    (row for row in csv.reader(open(path_to_file)) 
    if row[1].lower() in lookup))

Answer 4

我会为此使用 pandas。数据集不大，所以你可以在内存中完成。

import pandas as pd

words = pd.read_csv('words.txt')
defs = pd.read_csv('defs.csv')
words.set_index(0, inplace=True)
defs.set_index('WORD', inplace=True)
new_defs = words.join(defs)
new_defs.to_csv('new_defs.csv')

您可能需要操纵 new_defs 使其看起来像您想要的那样，但这就是它的要点。

与使用 Python 的 txt 文件中的列表进行比较时，如何从 csv 文件中删除行？

How to remove rows from a csv file when compared to a list in a txt file using Python?

python

csv

dictionary