有条件地合并文本文件中的行
Conditionally merge lines in text file
我有一个包含常见拼写错误及其更正的文本文件。
同一个单词的所有拼写错误都应该在同一行。
我确实做了一些,但不是针对同一个词的所有拼写错误。
misspellings_corpus.txt
(片段):
I'de->I'd
aple->apple
appl->apple
I'ed, I'ld, Id->I'd
期望:
I'de, I'ed, I'ld, Id->I'd
aple, appl->apple
模板:wrong1, wrong2, wrongN->correct
尝试:
lines = []
with open('/content/drive/MyDrive/Colab Notebooks/misspellings_corpus.txt', 'r') as fin:
lines = fin.readlines()
for this_idx, this_line in enumerate(lines):
for comparison_idx, comparison_line in enumerate(lines):
if this_idx != comparison_idx:
if this_line.split('->')[1].strip() == comparison_line.split('->')[1].strip():
#...
correct_words = [l.split('->')[1].strip() for l in lines]
correct_words
lines = []
with open('misspellings_corpus.txt', 'r') as fin:
lines = fin.readlines()
from collections import defaultdict
my_dict = defaultdict(list)
for line in lines:
curr_line = line.split("->")[0].replace(" ","")
if "," in curr_line:
for curr in curr_line.split(","):
my_dict[line.split("->")[1].strip()].append(curr)
else:
my_dict[line.split("->")[1].strip()].append(curr_line)
for key, values in my_dict.items():
print(f"{key} -> {', '.join(values)}")
将单词的正确拼写存储为字典的键,该字典映射到该单词的一组可能的拼写错误。 dict 旨在让您轻松找到要更正的单词,而 set 是为了避免重复拼写错误。
possible_misspellings = {}
with open('my-file.txt') as file:
for line in file:
misspellings, word = line.split('->')
word = word.strip()
misspellings = set(m.strip() for m in misspellings.split(','))
if word in possible_misspellings:
possible_misspellings[word].update(misspellings)
else:
possible_misspellings[word] = misspellings
然后你可以遍历你的字典
with open('my-new-file.txt', 'w') as file:
for word, misspellings in possible_misspellings.items():
line = ','.join(misspellings) + '->' + word + '\n'
file.write(line)
我有一个包含常见拼写错误及其更正的文本文件。
同一个单词的所有拼写错误都应该在同一行。
我确实做了一些,但不是针对同一个词的所有拼写错误。
misspellings_corpus.txt
(片段):
I'de->I'd
aple->apple
appl->apple
I'ed, I'ld, Id->I'd
期望:
I'de, I'ed, I'ld, Id->I'd
aple, appl->apple
模板:wrong1, wrong2, wrongN->correct
尝试:
lines = []
with open('/content/drive/MyDrive/Colab Notebooks/misspellings_corpus.txt', 'r') as fin:
lines = fin.readlines()
for this_idx, this_line in enumerate(lines):
for comparison_idx, comparison_line in enumerate(lines):
if this_idx != comparison_idx:
if this_line.split('->')[1].strip() == comparison_line.split('->')[1].strip():
#...
correct_words = [l.split('->')[1].strip() for l in lines]
correct_words
lines = []
with open('misspellings_corpus.txt', 'r') as fin:
lines = fin.readlines()
from collections import defaultdict
my_dict = defaultdict(list)
for line in lines:
curr_line = line.split("->")[0].replace(" ","")
if "," in curr_line:
for curr in curr_line.split(","):
my_dict[line.split("->")[1].strip()].append(curr)
else:
my_dict[line.split("->")[1].strip()].append(curr_line)
for key, values in my_dict.items():
print(f"{key} -> {', '.join(values)}")
将单词的正确拼写存储为字典的键,该字典映射到该单词的一组可能的拼写错误。 dict 旨在让您轻松找到要更正的单词,而 set 是为了避免重复拼写错误。
possible_misspellings = {}
with open('my-file.txt') as file:
for line in file:
misspellings, word = line.split('->')
word = word.strip()
misspellings = set(m.strip() for m in misspellings.split(','))
if word in possible_misspellings:
possible_misspellings[word].update(misspellings)
else:
possible_misspellings[word] = misspellings
然后你可以遍历你的字典
with open('my-new-file.txt', 'w') as file:
for word, misspellings in possible_misspellings.items():
line = ','.join(misspellings) + '->' + word + '\n'
file.write(line)