spaCy token.tag_ 完整列表
spaCy token.tag_ full list
token.tag_
在spaCy
的官方文档如下:
A fine-grained, more detailed tag that represents the word-class and some basic morphological information for the token. These tags are primarily designed to be good features for subsequent models, particularly the syntactic parser. They are language and treebank dependent. The tagger is trained to predict these fine-grained tags, and then a mapping table is used to reduce them to the coarse-grained .pos tags.
但它没有列出完整的可用标签和每个标签的解释。我在哪里可以找到它?
终于在spaCy
的源代码里面找到了:glossary.py. And this link解释了不同标签的含义
这是标签列表:
TAG_MAP = [
".",
",",
"-LRB-",
"-RRB-",
"``",
"\"\"",
"''",
",",
"$",
"#",
"AFX",
"CC",
"CD",
"DT",
"EX",
"FW",
"HYPH",
"IN",
"JJ",
"JJR",
"JJS",
"LS",
"MD",
"NIL",
"NN",
"NNP",
"NNPS",
"NNS",
"PDT",
"POS",
"PRP",
"PRP$",
"RB",
"RBR",
"RBS",
"RP",
"SP",
"SYM",
"TO",
"UH",
"VB",
"VBD",
"VBG",
"VBN",
"VBP",
"VBZ",
"WDT",
"WP",
"WP$",
"WRB",
"ADD",
"NFP",
"GW",
"XX",
"BES",
"HVS",
"_SP",
]
token.tag_
的可用值是特定于语言的。这里的语言,我指的不是英语或葡萄牙语,而是 'en_core_web_sm' 或 'pt_core_news_sm'。换句话说,它们是语言 model 特定的,它们在 TAG_MAP 中定义,这是可定制和可训练的。如果您不自定义它,它将是该语言的默认 TAG_MAP。
在撰写此答案时,spacy.io/models 列出了所有预训练模型及其标记方案。
现在,进行解释。如果您正在使用英文或德文文本,那么您很幸运!您可以在 github 上使用 spacy.explain() or access its glossary 以获得完整列表。如果您正在使用其他语言,token.pos_
值始终是通用依赖项的值,并且无论如何都会起作用。
最后,如果您正在使用其他语言,要获得标签的完整解释,您将不得不在 models 页面列出的来源中查找您的模型的来源兴趣。例如,对于葡萄牙语,我必须跟踪用于训练模型的 Portuguese UD Bosque Corpus 中标签的解释。
下面是 Spacy 使用的标签列表和 POS link。
https://spacy.io/api/annotation
- 通用词性标签
- 英语
- 德语
您可以使用
获得解释
from spacy import glossary
tag_name = 'ADP'
glossary.explain(tag_name)
版本:3.3.0
来源:https://github.com/explosion/spaCy/blob/master/spacy/glossary.py
token.tag_
在spaCy
的官方文档如下:
A fine-grained, more detailed tag that represents the word-class and some basic morphological information for the token. These tags are primarily designed to be good features for subsequent models, particularly the syntactic parser. They are language and treebank dependent. The tagger is trained to predict these fine-grained tags, and then a mapping table is used to reduce them to the coarse-grained .pos tags.
但它没有列出完整的可用标签和每个标签的解释。我在哪里可以找到它?
终于在spaCy
的源代码里面找到了:glossary.py. And this link解释了不同标签的含义
这是标签列表:
TAG_MAP = [
".",
",",
"-LRB-",
"-RRB-",
"``",
"\"\"",
"''",
",",
"$",
"#",
"AFX",
"CC",
"CD",
"DT",
"EX",
"FW",
"HYPH",
"IN",
"JJ",
"JJR",
"JJS",
"LS",
"MD",
"NIL",
"NN",
"NNP",
"NNPS",
"NNS",
"PDT",
"POS",
"PRP",
"PRP$",
"RB",
"RBR",
"RBS",
"RP",
"SP",
"SYM",
"TO",
"UH",
"VB",
"VBD",
"VBG",
"VBN",
"VBP",
"VBZ",
"WDT",
"WP",
"WP$",
"WRB",
"ADD",
"NFP",
"GW",
"XX",
"BES",
"HVS",
"_SP",
]
token.tag_
的可用值是特定于语言的。这里的语言,我指的不是英语或葡萄牙语,而是 'en_core_web_sm' 或 'pt_core_news_sm'。换句话说,它们是语言 model 特定的,它们在 TAG_MAP 中定义,这是可定制和可训练的。如果您不自定义它,它将是该语言的默认 TAG_MAP。
在撰写此答案时,spacy.io/models 列出了所有预训练模型及其标记方案。
现在,进行解释。如果您正在使用英文或德文文本,那么您很幸运!您可以在 github 上使用 spacy.explain() or access its glossary 以获得完整列表。如果您正在使用其他语言,token.pos_
值始终是通用依赖项的值,并且无论如何都会起作用。
最后,如果您正在使用其他语言,要获得标签的完整解释,您将不得不在 models 页面列出的来源中查找您的模型的来源兴趣。例如,对于葡萄牙语,我必须跟踪用于训练模型的 Portuguese UD Bosque Corpus 中标签的解释。
下面是 Spacy 使用的标签列表和 POS link。
https://spacy.io/api/annotation
- 通用词性标签
- 英语
- 德语
您可以使用
获得解释from spacy import glossary
tag_name = 'ADP'
glossary.explain(tag_name)
版本:3.3.0
来源:https://github.com/explosion/spaCy/blob/master/spacy/glossary.py