使用 udpipe 和 R 时如何解释 feats 的值
How to interpret values of feats when using udpipe and R
在 R udpipe 包中,如果我们这样编码:
library(udpipe)
x <- udpipe("The economy is weak but the outlook is bright. the property market will be booming next year", "english")
结果是:
doc_id paragraph_id sentence_id sentence start end term_id token_id token lemma upos
1 doc1 1 1 The economy is weak but the outlook is bright 1 3 1 1 The the DET
2 doc1 1 1 The economy is weak but the outlook is bright 5 11 2 2 economy economy NOUN
3 doc1 1 1 The economy is weak but the outlook is bright 13 14 3 3 is be AUX
4 doc1 1 1 The economy is weak but the outlook is bright 16 19 4 4 weak weak ADJ
5 doc1 1 1 The economy is weak but the outlook is bright 21 23 5 5 but but CCONJ
6 doc1 1 1 The economy is weak but the outlook is bright 25 27 6 6 the the DET
7 doc1 1 1 The economy is weak but the outlook is bright 29 35 7 7 outlook outlook NOUN
8 doc1 1 1 The economy is weak but the outlook is bright 37 38 8 8 is be AUX
9 doc1 1 1 The economy is weak but the outlook is bright 40 45 9 9 bright bright ADJ
xpos feats head_token_id dep_rel deps misc
1 DT Definite=Def|PronType=Art 2 det <NA> <NA>
2 NN Number=Sing 4 nsubj <NA> <NA>
3 VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 4 cop <NA> <NA>
4 JJ Degree=Pos 0 root <NA> <NA>
5 CC <NA> 9 cc <NA> <NA>
6 DT Definite=Def|PronType=Art 7 det <NA> <NA>
7 NN Number=Sing 9 nsubj <NA> <NA>
8 VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 9 cop <NA> <NA>
9 JJ Degree=Pos 4 conj <NA> SpacesAfter=\n
我通读了 https://universaldependencies.org/ext-feat-index.html。但是我还是不明白这里的feats是什么意思?
这些是单词的形态特征。例如名词的性、数和大小写;人称、数字、动词体等
这部分Universal Dependencies注解根本不通用。您引用的页面包含可以出现在 UD 中所有语言中的所有形态特征。它们中的大多数不适用于大多数语言,某些现象可能会在不同的树库中以不同的名称出现多次。更棘手的是,一些 UDPipe 训练的树库根本不包含形态学特征。那么 UDPipe 当然只包含它可以从树库中学到的东西。
UD 包含六个不同的英语树库,因此在 UDPipe 中也有六个不同的模型。有an overview at the UD webpage that explains how the treebanks differ and also explains the morphological features that are used for English. The default for English is UD_English-EWT
.
在 R udpipe 包中,如果我们这样编码:
library(udpipe)
x <- udpipe("The economy is weak but the outlook is bright. the property market will be booming next year", "english")
结果是:
doc_id paragraph_id sentence_id sentence start end term_id token_id token lemma upos
1 doc1 1 1 The economy is weak but the outlook is bright 1 3 1 1 The the DET
2 doc1 1 1 The economy is weak but the outlook is bright 5 11 2 2 economy economy NOUN
3 doc1 1 1 The economy is weak but the outlook is bright 13 14 3 3 is be AUX
4 doc1 1 1 The economy is weak but the outlook is bright 16 19 4 4 weak weak ADJ
5 doc1 1 1 The economy is weak but the outlook is bright 21 23 5 5 but but CCONJ
6 doc1 1 1 The economy is weak but the outlook is bright 25 27 6 6 the the DET
7 doc1 1 1 The economy is weak but the outlook is bright 29 35 7 7 outlook outlook NOUN
8 doc1 1 1 The economy is weak but the outlook is bright 37 38 8 8 is be AUX
9 doc1 1 1 The economy is weak but the outlook is bright 40 45 9 9 bright bright ADJ
xpos feats head_token_id dep_rel deps misc
1 DT Definite=Def|PronType=Art 2 det <NA> <NA>
2 NN Number=Sing 4 nsubj <NA> <NA>
3 VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 4 cop <NA> <NA>
4 JJ Degree=Pos 0 root <NA> <NA>
5 CC <NA> 9 cc <NA> <NA>
6 DT Definite=Def|PronType=Art 7 det <NA> <NA>
7 NN Number=Sing 9 nsubj <NA> <NA>
8 VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 9 cop <NA> <NA>
9 JJ Degree=Pos 4 conj <NA> SpacesAfter=\n
我通读了 https://universaldependencies.org/ext-feat-index.html。但是我还是不明白这里的feats是什么意思?
这些是单词的形态特征。例如名词的性、数和大小写;人称、数字、动词体等
这部分Universal Dependencies注解根本不通用。您引用的页面包含可以出现在 UD 中所有语言中的所有形态特征。它们中的大多数不适用于大多数语言,某些现象可能会在不同的树库中以不同的名称出现多次。更棘手的是,一些 UDPipe 训练的树库根本不包含形态学特征。那么 UDPipe 当然只包含它可以从树库中学到的东西。
UD 包含六个不同的英语树库,因此在 UDPipe 中也有六个不同的模型。有an overview at the UD webpage that explains how the treebanks differ and also explains the morphological features that are used for English. The default for English is UD_English-EWT
.