nltk 四元组搭配查找器
nltk quadgram collocation finder
我看到很多问题和答案都说 NLTK 搭配不能超出 bi 和 tri gram。
例如这个 -
How to get n-gram collocations and association in python nltk?
我看到有一个叫做
的东西
nltk.QuadgramCollocationFinder
类似于
nltk.BigramCollocationFinder 和 nltk.TrigramCollocationFinder
但同时无法看到类似
的东西
nltk.collocations.QuadgramAssocMeasures()
类似于
nltk.collocations.BigramAssocMeasures() 和 nltk.collocations.TrigramAssocMeasures()
nltk.QuadgramCollocationFinder 如果不可能(没有 hack)找到 bi 和 tri gram 之外的 n-grams,那么 nltk.QuadgramCollocationFinder 的目的是什么。
也许我遗漏了什么。
谢谢,
添加代码并根据 Alvas 的输入更新问题,这现在有效
import nltk
from nltk.collocations import *
from nltk.corpus import PlaintextCorpusReader
from nltk.metrics.association import QuadgramAssocMeasures
bigram_measures = nltk.collocations.BigramAssocMeasures()
trigram_measures = nltk.collocations.TrigramAssocMeasures()
quadgram_measures = QuadgramAssocMeasures()
the_filter = lambda *w: 'crazy' not in w
finder = BigramCollocationFinder.from_words(corpus)
finder.apply_freq_filter(3)
finder.apply_ngram_filter(the_filter)
print (finder.nbest(bigram_measures.likelihood_ratio, 10))
finder = QuadgramCollocationFinder.from_words(corpus)
finder.apply_freq_filter(3)
finder.apply_ngram_filter(the_filter)
print(finder.nbest(quadgram_measures.likelihood_ratio,10))
来自repo:
from nltk.metrics.association import QuadgramAssocMeasures
我看到很多问题和答案都说 NLTK 搭配不能超出 bi 和 tri gram。
例如这个 - How to get n-gram collocations and association in python nltk?
我看到有一个叫做
的东西nltk.QuadgramCollocationFinder
类似于
nltk.BigramCollocationFinder 和 nltk.TrigramCollocationFinder
但同时无法看到类似
的东西nltk.collocations.QuadgramAssocMeasures()
类似于 nltk.collocations.BigramAssocMeasures() 和 nltk.collocations.TrigramAssocMeasures()
nltk.QuadgramCollocationFinder 如果不可能(没有 hack)找到 bi 和 tri gram 之外的 n-grams,那么 nltk.QuadgramCollocationFinder 的目的是什么。
也许我遗漏了什么。
谢谢,
添加代码并根据 Alvas 的输入更新问题,这现在有效
import nltk
from nltk.collocations import *
from nltk.corpus import PlaintextCorpusReader
from nltk.metrics.association import QuadgramAssocMeasures
bigram_measures = nltk.collocations.BigramAssocMeasures()
trigram_measures = nltk.collocations.TrigramAssocMeasures()
quadgram_measures = QuadgramAssocMeasures()
the_filter = lambda *w: 'crazy' not in w
finder = BigramCollocationFinder.from_words(corpus)
finder.apply_freq_filter(3)
finder.apply_ngram_filter(the_filter)
print (finder.nbest(bigram_measures.likelihood_ratio, 10))
finder = QuadgramCollocationFinder.from_words(corpus)
finder.apply_freq_filter(3)
finder.apply_ngram_filter(the_filter)
print(finder.nbest(quadgram_measures.likelihood_ratio,10))
来自repo:
from nltk.metrics.association import QuadgramAssocMeasures