使用 udpipe 和 R 时如何解释 feats 的值

How to interpret values of feats when using udpipe and R

在 R udpipe 包中,如果我们这样编码:

library(udpipe)
x <- udpipe("The economy is weak but the outlook is bright. the property market will be booming next year", "english")

结果是:

  doc_id paragraph_id sentence_id                                      sentence start end term_id token_id   token   lemma  upos
1   doc1            1           1 The economy is weak but the outlook is bright     1   3       1        1     The     the   DET
2   doc1            1           1 The economy is weak but the outlook is bright     5  11       2        2 economy economy  NOUN
3   doc1            1           1 The economy is weak but the outlook is bright    13  14       3        3      is      be   AUX
4   doc1            1           1 The economy is weak but the outlook is bright    16  19       4        4    weak    weak   ADJ
5   doc1            1           1 The economy is weak but the outlook is bright    21  23       5        5     but     but CCONJ
6   doc1            1           1 The economy is weak but the outlook is bright    25  27       6        6     the     the   DET
7   doc1            1           1 The economy is weak but the outlook is bright    29  35       7        7 outlook outlook  NOUN
8   doc1            1           1 The economy is weak but the outlook is bright    37  38       8        8      is      be   AUX
9   doc1            1           1 The economy is weak but the outlook is bright    40  45       9        9  bright  bright   ADJ
  xpos                                                 feats head_token_id dep_rel deps            misc
1   DT                             Definite=Def|PronType=Art             2     det <NA>            <NA>
2   NN                                           Number=Sing             4   nsubj <NA>            <NA>
3  VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin             4     cop <NA>            <NA>
4   JJ                                            Degree=Pos             0    root <NA>            <NA>
5   CC                                                  <NA>             9      cc <NA>            <NA>
6   DT                             Definite=Def|PronType=Art             7     det <NA>            <NA>
7   NN                                           Number=Sing             9   nsubj <NA>            <NA>
8  VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin             9     cop <NA>            <NA>
9   JJ                                            Degree=Pos             4    conj <NA> SpacesAfter=\n

我通读了 https://universaldependencies.org/ext-feat-index.html。但是我还是不明白这里的feats是什么意思?

这些是单词的形态特征。例如名词的性、数和大小写;人称、数字、动词体等

这部分Universal Dependencies注解根本不通用。您引用的页面包含可以出现在 UD 中所有语言中的所有形态特征。它们中的大多数不适用于大多数语言,某些现象可能会在不同的树库中以不同的名称出现多次。更棘手的是,一些 UDPipe 训练的树库根本不包含形态学特征。那么 UDPipe 当然只包含它可以从树库中学到的东西。

UD 包含六个不同的英语树库,因此在 UDPipe 中也有六个不同的模型。有an overview at the UD webpage that explains how the treebanks differ and also explains the morphological features that are used for English. The default for English is UD_English-EWT.