如何从 nltk pos_tag 获取标签集?
how to get tagset from nltk pos_tag?
我正在尝试从 nltk pos_tag 获取完整的标签,但我找不到使用 nltk 的简单方法。例如,使用 tagsets='universal'
.
from nltk.tokenize import word_tokenize
def nltk_pos(text):
token = word_tokenize(text)
return (nltk.pos_tag(token)[0])[1]
nltk_pos('home')
output: 'NN'
expected output: 'NOUN'
我在为我写的一篇论文做NLP分析时遇到了同样的问题。我不得不使用这样的映射函数:
import nltk
from nltk.tokenize import word_tokenize
def get_full_tag_pos(pos_tag):
tag_dict = {"J": "ADJ",
"N": "NOUN",
"V": "VERB",
"R": "ADV"}
# assuming pos_tag comes in as capital letters i.e. 'JJR' or 'NN'
return tag_dict.get(pos_tag[0], 'NOUN')
# example
words = word_tokenize(text)
words_pos = nltk.pos_tag(words)
full_tag_words_pos = [word_pos[0] + "/" + get_full_tag_pos(word_pos[1]) for word_pos in words_pos]
我正在尝试从 nltk pos_tag 获取完整的标签,但我找不到使用 nltk 的简单方法。例如,使用 tagsets='universal'
.
from nltk.tokenize import word_tokenize
def nltk_pos(text):
token = word_tokenize(text)
return (nltk.pos_tag(token)[0])[1]
nltk_pos('home')
output: 'NN'
expected output: 'NOUN'
我在为我写的一篇论文做NLP分析时遇到了同样的问题。我不得不使用这样的映射函数:
import nltk
from nltk.tokenize import word_tokenize
def get_full_tag_pos(pos_tag):
tag_dict = {"J": "ADJ",
"N": "NOUN",
"V": "VERB",
"R": "ADV"}
# assuming pos_tag comes in as capital letters i.e. 'JJR' or 'NN'
return tag_dict.get(pos_tag[0], 'NOUN')
# example
words = word_tokenize(text)
words_pos = nltk.pos_tag(words)
full_tag_words_pos = [word_pos[0] + "/" + get_full_tag_pos(word_pos[1]) for word_pos in words_pos]