如何在斯坦福依赖解析器中保留标点符号
How to keep punctuation in Stanford dependency parser
我正在使用 Stanford CoreNLP(01.2016 版本),我想保留依赖关系中的标点符号。当你从命令行 运行 时,我找到了一些方法来做到这一点,但我没有找到任何关于提取依赖关系的 java 代码的信息。
这是我当前的代码。它有效,但不包括标点符号:
Annotation document = new Annotation(text);
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, parse");
props.setProperty("ssplit.newlineIsSentenceBreak", "always");
props.setProperty("ssplit.eolonly", "true");
props.setProperty("pos.model", modelPath1);
props.put("parse.model", modelPath );
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
pipeline.annotate(document);
LexicalizedParser lp = LexicalizedParser.loadModel(modelPath + lexparserNameEn,
"-maxLength", "200", "-retainTmpSubcategories");
TreebankLanguagePack tlp = new PennTreebankLanguagePack();
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
List<CoreLabel> words = sentence.get(CoreAnnotations.TokensAnnotation.class);
Tree parse = lp.apply(words);
GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
Collection<TypedDependency> td = gs.typedDependencies();
parsedText += td.toString() + "\n";
任何类型的依赖关系对我来说都可以,基本的、类型化的、折叠的等等。
我只想包括标点符号。
提前致谢,
你在这里做了很多额外的工作,因为你是 运行 解析器一次通过 CoreNLP 然后再次调用 lp.apply(words)
.
获取带有标点符号的依赖关系 tree/graph 的最简单方法是使用 CoreNLP 选项 parse.keepPunct
,如下所示。
Annotation document = new Annotation(text);
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, parse");
props.setProperty("ssplit.newlineIsSentenceBreak", "always");
props.setProperty("ssplit.eolonly", "true");
props.setProperty("pos.model", modelPath1);
props.setProperty("parse.model", modelPath);
props.setProperty("parse.keepPunct", "true");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
pipeline.annotate(document);
for (CoreMap sentence : sentences) {
//Pick whichever representation you want
SemanticGraph basicDeps = sentence.get(SemanticGraphCoreAnnotations.BasicDependenciesAnnotation.class);
SemanticGraph collapsed = sentence.get(SemanticGraphCoreAnnotations.CollapsedDependenciesAnnotation.class);
SemanticGraph ccProcessed = sentence.get(SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation.class);
}
句子注释对象将依存关系trees/graphs存储为SemanticGraph
。如果您想要 TypedDependency
个对象的列表,请使用方法 typedDependencies()
。例如,
List<TypedDependency> dependencies = basicDeps.typedDependencies();
我正在使用 Stanford CoreNLP(01.2016 版本),我想保留依赖关系中的标点符号。当你从命令行 运行 时,我找到了一些方法来做到这一点,但我没有找到任何关于提取依赖关系的 java 代码的信息。
这是我当前的代码。它有效,但不包括标点符号:
Annotation document = new Annotation(text);
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, parse");
props.setProperty("ssplit.newlineIsSentenceBreak", "always");
props.setProperty("ssplit.eolonly", "true");
props.setProperty("pos.model", modelPath1);
props.put("parse.model", modelPath );
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
pipeline.annotate(document);
LexicalizedParser lp = LexicalizedParser.loadModel(modelPath + lexparserNameEn,
"-maxLength", "200", "-retainTmpSubcategories");
TreebankLanguagePack tlp = new PennTreebankLanguagePack();
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
List<CoreLabel> words = sentence.get(CoreAnnotations.TokensAnnotation.class);
Tree parse = lp.apply(words);
GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
Collection<TypedDependency> td = gs.typedDependencies();
parsedText += td.toString() + "\n";
任何类型的依赖关系对我来说都可以,基本的、类型化的、折叠的等等。 我只想包括标点符号。
提前致谢,
你在这里做了很多额外的工作,因为你是 运行 解析器一次通过 CoreNLP 然后再次调用 lp.apply(words)
.
获取带有标点符号的依赖关系 tree/graph 的最简单方法是使用 CoreNLP 选项 parse.keepPunct
,如下所示。
Annotation document = new Annotation(text);
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, parse");
props.setProperty("ssplit.newlineIsSentenceBreak", "always");
props.setProperty("ssplit.eolonly", "true");
props.setProperty("pos.model", modelPath1);
props.setProperty("parse.model", modelPath);
props.setProperty("parse.keepPunct", "true");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
pipeline.annotate(document);
for (CoreMap sentence : sentences) {
//Pick whichever representation you want
SemanticGraph basicDeps = sentence.get(SemanticGraphCoreAnnotations.BasicDependenciesAnnotation.class);
SemanticGraph collapsed = sentence.get(SemanticGraphCoreAnnotations.CollapsedDependenciesAnnotation.class);
SemanticGraph ccProcessed = sentence.get(SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation.class);
}
句子注释对象将依存关系trees/graphs存储为SemanticGraph
。如果您想要 TypedDependency
个对象的列表,请使用方法 typedDependencies()
。例如,
List<TypedDependency> dependencies = basicDeps.typedDependencies();