ConditionalFreqDist 找到单词最频繁的 POS 标签
ConditionalFreqDist to find most frequent POS tags for words
我试图为数据集中的词找到最常见的 POS 标签,但在 ConditionalFreqDist 部分遇到困难。
import nltk
tw = nltk.corpus.brown.tagged_words()
train_idx = int(0.8*len(tw))
training_set = tw[:train_idx]
test_set = tw[train_idx:]
words= list(zip(*training_set))[0]
from nltk import ConditionalFreqDist
ofd= ConditionalFreqDist(word for word in list(zip(*training_set))[0])
tags= list(zip(*training_set))[1]
ofd.tabulate(conditions= words, samples= tags)
ValueError: too many values to unpack (expected 2)
正如您可能在文档中读到的那样,ConditionalFreqDist
可以帮助您计算
A collection of frequency distributions for a single experiment run under different conditions.
您唯一必须提供的是可以(在本题中)翻译成单词和相应 POS 标签的项目和条件列表。具有最小更改的代码将如下所示,并将计算整个语料库的分布,但将前 10 个项目和条件的结果制成表格(防止崩溃):
import nltk
from nltk import ConditionalFreqDist
tw = nltk.corpus.brown.tagged_words()
train_idx = int(0.8*len(tw))
training_set = tw[:train_idx]
test_set = tw[train_idx:]
words= list(zip(*training_set))[0] # items
tags= list(zip(*training_set))[1] # conditions
ofd= ConditionalFreqDist((tag, word) for tag, word in zip(words, tags)) # simple comprehension pattern in python
ofd.tabulate(conditions= words[:10], samples= tags[:10])
我试图为数据集中的词找到最常见的 POS 标签,但在 ConditionalFreqDist 部分遇到困难。
import nltk
tw = nltk.corpus.brown.tagged_words()
train_idx = int(0.8*len(tw))
training_set = tw[:train_idx]
test_set = tw[train_idx:]
words= list(zip(*training_set))[0]
from nltk import ConditionalFreqDist
ofd= ConditionalFreqDist(word for word in list(zip(*training_set))[0])
tags= list(zip(*training_set))[1]
ofd.tabulate(conditions= words, samples= tags)
ValueError: too many values to unpack (expected 2)
正如您可能在文档中读到的那样,ConditionalFreqDist
可以帮助您计算
A collection of frequency distributions for a single experiment run under different conditions.
您唯一必须提供的是可以(在本题中)翻译成单词和相应 POS 标签的项目和条件列表。具有最小更改的代码将如下所示,并将计算整个语料库的分布,但将前 10 个项目和条件的结果制成表格(防止崩溃):
import nltk
from nltk import ConditionalFreqDist
tw = nltk.corpus.brown.tagged_words()
train_idx = int(0.8*len(tw))
training_set = tw[:train_idx]
test_set = tw[train_idx:]
words= list(zip(*training_set))[0] # items
tags= list(zip(*training_set))[1] # conditions
ofd= ConditionalFreqDist((tag, word) for tag, word in zip(words, tags)) # simple comprehension pattern in python
ofd.tabulate(conditions= words[:10], samples= tags[:10])