如何从文本中提取动词和所有对应的副词?

How to extract the verbs and all corresponding adverbs from a text?

在 Python 中使用 ngram 我的目标是从输入文本中找出动词及其对应的副词。 我做了什么:

输入文字:“"He is talking weirdly. A horse can run fast. A big tree is there. The sun is beautiful. The place is well decorated.They are talking weirdly. She runs fast. She is talking greatly.Jack runs slow."” 代码:-

`finder2 = BigramCollocationFinder.from_words(wrd for (wrd,tags) in posTagged if tags in('VBG','RB','VBN',))
scored = finder2.score_ngrams(bigram_measures.raw_freq)
print sorted(finder2.nbest(bigram_measures.raw_freq, 5))`

从我的代码中,我得到了输出: [('talking', 'greatly'), ('talking', 'weirdly'), ('weirdly', 'talking'),('runs','fast'),('runs','slow')] 这是动词及其对应副词的列表。

我要找的是:

我想从中找出动词和所有对应的副词。例如('talking'- 'greatly','weirdly),('runs'-'fast','slow')etc.

我认为您正在丢失为此所需的信息。您需要以某种方式保留 part-of-speech 数据,以便可以以正确的方式处理像 ('weirdly', 'talking') 这样的二元组。

可能是bigram finder 可以接受标记词元组(我对nltk 不熟悉)。或者,您可能不得不求助于创建外部索引。如果是这样,这样的事情可能会奏效:

part_of_speech = {word:tag for word,tag in posTagged}
best_bigrams = finger2.nbest(... as you like it ...)

verb_first_bigrams = [b if part_of_speech[b[1]] == 'RB' else (b[1],b[0]) for b in best_bigrams]

然后,加上前面的动词,你可以把它变成字典或list-of-lists或其他什么:

adverbs_for = {}
for verb,adverb in verb_first_bigrams:
    if verb not in adverbs_for:
        adverbs_for[verb] = [adverb]
    else:
        adverbs_for[verb].append(adverb)

您已经有了所有动词-副词二元组的列表,所以您只是想问如何将它们合并到一个字典中,该字典为每个动词提供 所有 个副词。但首先让我们以更直接的方式重新创建您的双字母组:

pairs = list()
for (w1, tag1), (w2, tag2) in nltk.bigrams(posTagged):
    if t1.startswith("VB") and t2 == "RB":
        pairs.append((w1, w2))

现在回答你的问题:我们将用每个动词后面的副词构建一个字典。我会将副词存储在一个集合中,而不是一个列表中,以消除重复。

from collections import defaultdict
consolidated = defaultdict(set)
for verb, adverb in pairs:
    consolidated[verb].add(adverb)

defaultdict为以前没有见过的动词提供了一个空集,所以我们不需要手工检查。

根据作业的详细信息,您可能还需要对动词进行大小写折叠和词形还原,以便将 "Driving recklessly" 和 "I drove carefully" 中的副词记录在一起:

wnl = nltk.stem.WordNetLemmatizer()
...
for verb, adverb in pairs:
    verb = wnl.lemmatize(verb.lower(), "v")
    consolidated[verb].add(adverb)