以'e'结尾的英语动词处理
English verbs processing ending with 'e'
考虑到这些转换,我正在实施一些字符串替换器
'thou sittest' → 'you sit'
'thou walkest' → 'you walk'
'thou liest' → 'you lie'
'thou risest' → 'you rise'
如果我保持天真,可以在这种情况下使用正则表达式来查找和替换,例如 thou [a-z]+est
但是以 e
结尾的英语动词会带来麻烦,因为根据上下文,我需要 trim est
in some & trim just st
其余
实现此目的的快速解决方案是什么?
可能是最快最脏的:
import nltk
words = set(nltk.corpus.words.words())
for old in 'sittest walkest liest risest'.split():
new = old[:-2]
while new and new not in words:
new = new[:-1]
print(old, new)
输出:
sittest sit
walkest walk
liest lie
risest rise
更新。稍微不那么快速和肮脏(例如适用于 rotest
→ 动词 rot
,而不是名词 rote
):
from nltk.corpus import wordnet as wn
for old in 'sittest walkest liest risest rotest'.split():
new = old[:-2]
while new and not wn.synsets(new, pos='v'):
new = new[:-1]
print(old, new)
输出:
sittest sit
walkest walk
liest lie
risest rise
rotest rot
考虑到这些转换,我正在实施一些字符串替换器
'thou sittest' → 'you sit'
'thou walkest' → 'you walk'
'thou liest' → 'you lie'
'thou risest' → 'you rise'
如果我保持天真,可以在这种情况下使用正则表达式来查找和替换,例如 thou [a-z]+est
但是以 e
结尾的英语动词会带来麻烦,因为根据上下文,我需要 trim est
in some & trim just st
其余
实现此目的的快速解决方案是什么?
可能是最快最脏的:
import nltk
words = set(nltk.corpus.words.words())
for old in 'sittest walkest liest risest'.split():
new = old[:-2]
while new and new not in words:
new = new[:-1]
print(old, new)
输出:
sittest sit
walkest walk
liest lie
risest rise
更新。稍微不那么快速和肮脏(例如适用于 rotest
→ 动词 rot
,而不是名词 rote
):
from nltk.corpus import wordnet as wn
for old in 'sittest walkest liest risest rotest'.split():
new = old[:-2]
while new and not wn.synsets(new, pos='v'):
new = new[:-1]
print(old, new)
输出:
sittest sit
walkest walk
liest lie
risest rise
rotest rot