如何在 python 上使用 lemas 列表

How to use lemas list on python

a 有一个问题: 我正在使用 python 来分析 data.First 我正在使用引理列表 (lemas.txt) 来预处理我的数据。 我有引理列表: 例如:

A-bomb -> A-bombs
abacus -> abacuses
abandon -> abandons,abandoning,abandoned
abase -> abases,abasing,abased
abate -> abates,abating,abated
abbess -> abbesses
abbey -> abbeys
abbot -> abbots

....... 你能帮我用我的列表来清除我的数据 python.Thanks

此代码将解析您的引理文件并将它们放入字典中,其中键是将被替换的单词,值是它们将被替换的内容。

def parse_lemmas(leema_lines):
    for line in lemmas_lines:
        target, from_words_str = line.split(' -> ')
        from_words = from_words_str.split(',')
        for word in from_words:
            yield (word, target)


with open('lemmas.txt', 'r') as lemmas_file:
    lemmas = dict(parse_lemmas(lemma_line.strip() for lemma_line in lemmas_file))

# The dictionary lemmas now has all the lemmas in the lemmas file

将数据分成单词列表后,您可以运行以下代码。

# if your data isn't too large
new_data = [lemmas.get(word, word) for word in data]

# if it's so large you don't want to make another copy,
# you can do it in-place
for idx, word in data:
    data[idx] = lemmas.get(word, word)

请注意,数据不一定只是文字;例如,您可以将 "This is your data. This, here, is your data with punctuation; see?" 拆分为 ['This', 'is', 'your', 'data', '.', 'This', ',', 'here', ',', 'is', 'your', 'data', 'with', 'punctuation', ';', 'see', '?']。在这种情况下,标点符号将被传递。执行此操作的最佳方法取决于您的实际数据以及 splitting/recombining.

时需要保留的信息