N-gram 语言模型 returns 无
N-gram Language Model returns nothing
我正在按照此处的教程进行操作:https://www.analyticsvidhya.com/blog/2019/08/comprehensive-guide-language-model-nlp-python-code/#h2_5 创建语言模型。我正在关注 N-gram 语言模型。
这是完整的代码:
from nltk.corpus import reuters
from nltk import bigrams, trigrams
from collections import Counter, defaultdict
# Create a placeholder for model
model = defaultdict(lambda: defaultdict(lambda: 0))
# Count frequency of co-occurance
for sentence in reuters.sents():
for w1, w2, w3 in trigrams(sentence, pad_right=True, pad_left=True):
model[(w1, w2)][w3] += 1
# Let's transform the counts to probabilities
for w1_w2 in model:
total_count = float(sum(model[w1_w2].values()))
for w3 in model[w1_w2]:
model[w1_w2][w3] /= total_count
input = input("Hi there! Please enter an incomplete sentence and I can help you\
finish it!\n").lower().split()
print(model[tuple(input)])
为了从模型中获取输出,网站是这样做的:print(dict(model["the", "price"]))
但我想从用户输入的句子中生成输出。当我写 print(model[tuple(input)])
时,它给了我一个空的 defaultdict。
忽略这个(保留历史):
How do I give it the list I create from the input? model
is a
dictionary and I've read that using a list as a key isn't a good idea
but that's exactly what they're doing? And I'm assuming mine doesn't
work because I'm listing a list? Would I have to iterate through the
words to get results?
As a side note, is this model considering the sentence as a whole to
predict the next word, or just the last word?
我不得不向模型提供列表中的最后两个词而不是整个词,即使它是两个词。像这样:
model[tuple(input[-2:])]
我正在按照此处的教程进行操作:https://www.analyticsvidhya.com/blog/2019/08/comprehensive-guide-language-model-nlp-python-code/#h2_5 创建语言模型。我正在关注 N-gram 语言模型。
这是完整的代码:
from nltk.corpus import reuters
from nltk import bigrams, trigrams
from collections import Counter, defaultdict
# Create a placeholder for model
model = defaultdict(lambda: defaultdict(lambda: 0))
# Count frequency of co-occurance
for sentence in reuters.sents():
for w1, w2, w3 in trigrams(sentence, pad_right=True, pad_left=True):
model[(w1, w2)][w3] += 1
# Let's transform the counts to probabilities
for w1_w2 in model:
total_count = float(sum(model[w1_w2].values()))
for w3 in model[w1_w2]:
model[w1_w2][w3] /= total_count
input = input("Hi there! Please enter an incomplete sentence and I can help you\
finish it!\n").lower().split()
print(model[tuple(input)])
为了从模型中获取输出,网站是这样做的:print(dict(model["the", "price"]))
但我想从用户输入的句子中生成输出。当我写 print(model[tuple(input)])
时,它给了我一个空的 defaultdict。
忽略这个(保留历史):
How do I give it the list I create from the input?
model
is a dictionary and I've read that using a list as a key isn't a good idea but that's exactly what they're doing? And I'm assuming mine doesn't work because I'm listing a list? Would I have to iterate through the words to get results?As a side note, is this model considering the sentence as a whole to predict the next word, or just the last word?
我不得不向模型提供列表中的最后两个词而不是整个词,即使它是两个词。像这样:
model[tuple(input[-2:])]