Stanford Java NLP 选区标签缩写
Stanford Java NLP Constituency labels abbreviations
使用 Stanford Java CoreNLP 库,我有这个:
String text = "My name is Anthony";
CoreDocument doc = new CoreDocument(text);
pipeline.annotate(doc);
for(Tree t : doc.sentences().get(0).constituencyParse()) {
String tmp = "";
for(Word w : t.yieldWords()) {
tmp = tmp + " " + w.word();
}
System.out.println(t.label().toString() + " - " + WordParts.getValue(t.label().toString()) + " - " + tmp);
现在,程序输出如下:
ROOT - INVALID - My name is Anthony
S - INVALID - My name is Anthony
NP - INVALID - My name
PRP$ - Possessive pronoun - My
My-1 - INVALID - My
NN - Singular noun - name
name-2 - INVALID - name
VP - INVALID - is Anthony
VBZ - 3rd person singular present verb - is
Subject: Anthony
is-3 - INVALID - is
NP - INVALID - Anthony
NNP - Proper singular noun - Anthony
Anthony-4 - INVALID - Anthony
WordParts.java
的缩写来自这个post(Java Stanford NLP: Part of Speech labels?) and the class file can be found here: (https://github.com/AJ4real/References/blob/master/WordParts.java)
我知道标签不是 Parts of Speech
因为某些值 return INVALID
,所以我如何才能找到来自 t.label().toString()
的缩写的完整术语?
其余为 Penn Treebank 短语类别。例如,参见此处:
使用 Stanford Java CoreNLP 库,我有这个:
String text = "My name is Anthony";
CoreDocument doc = new CoreDocument(text);
pipeline.annotate(doc);
for(Tree t : doc.sentences().get(0).constituencyParse()) {
String tmp = "";
for(Word w : t.yieldWords()) {
tmp = tmp + " " + w.word();
}
System.out.println(t.label().toString() + " - " + WordParts.getValue(t.label().toString()) + " - " + tmp);
现在,程序输出如下:
ROOT - INVALID - My name is Anthony
S - INVALID - My name is Anthony
NP - INVALID - My name
PRP$ - Possessive pronoun - My
My-1 - INVALID - My
NN - Singular noun - name
name-2 - INVALID - name
VP - INVALID - is Anthony
VBZ - 3rd person singular present verb - is
Subject: Anthony
is-3 - INVALID - is
NP - INVALID - Anthony
NNP - Proper singular noun - Anthony
Anthony-4 - INVALID - Anthony
WordParts.java
的缩写来自这个post(Java Stanford NLP: Part of Speech labels?) and the class file can be found here: (https://github.com/AJ4real/References/blob/master/WordParts.java)
我知道标签不是 Parts of Speech
因为某些值 return INVALID
,所以我如何才能找到来自 t.label().toString()
的缩写的完整术语?
其余为 Penn Treebank 短语类别。例如,参见此处: