在 stanford nlp 中配置一个单独的模型 jar
configuring a separate model jar in stanford nlp
我已经实现了使用 stanford nlp 从特定英文句子中获取位置的逻辑。我正在使用以下罐子
stanford-corenlp-3.2.0.jar
stanford-corenlp-3.2.0-models.jar
我写的逻辑如下
public static edu.stanford.nlp.pipeline.StanfordCoreNLP snlp;
/**
* @see ServletContextListener#contextInitialized(ServletContextEvent)
*/
public void contextInitialized(ServletContextEvent arg0) {
Properties props = new Properties();
props.put("annotators", "tokenize,ssplit,pos,lemma,parse,ner,dcoref");
StanfordCoreNLP snlp = new StanfordCoreNLP(props);
}
但是由于区分大小写的问题,我被建议使用
stanford-corenlp-caseless-2015-04-20-models.jar 而不是 stanford-corenlp-3.2.0.jar。
从上面的代码中,默认加载的 jar 是 stanford-corenlp-3.2.0-models.jar.
但是我现在想使用以下模型进行配置,即 stanford-corenlp-caseless-2015-04-20-models.jar
请指导我如何使用 java 代码配置它。
我尝试了 Gabor 的解决方案。但是我得到了以下异常
SEVERE: Exception sending context initialized event to listener instance of class servlets.NLP_initializer
java.lang.RuntimeException: edu.stanford.nlp.io.RuntimeIOException: Unrecoverable error while loading a tagger model
at edu.stanford.nlp.pipeline.StanfordCoreNLP.create(StanfordCoreNLP.java:493)
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:81)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:260)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:127)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:123)
at servlets.NLP_initializer.contextInitialized(NLP_initializer.java:34)
at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4887)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5381)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1559)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1549)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: edu.stanford.nlp.io.RuntimeIOException: Unrecoverable error while loading a tagger model
at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:749)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:283)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:247)
at edu.stanford.nlp.pipeline.POSTaggerAnnotator.loadModel(POSTaggerAnnotator.java:78)
at edu.stanford.nlp.pipeline.POSTaggerAnnotator.<init>(POSTaggerAnnotator.java:62)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.create(StanfordCoreNLP.java:491)
... 14 more
Caused by: java.io.IOException: Unable to resolve "edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger" as either class path, filename or URL
at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:419)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:744)
... 19 more
见http://nlp.stanford.edu/software/corenlp.shtml#caseless
从文档中复制:
It is possible to run StanfordCoreNLP with tagger, parser, and NER models that ignore capitalization. In order to do this, download the caseless models package. Be sure to include the path to the case insensitive models jar in the -cp classpath flag as well. Then, set properties which point to these models as follows:
-pos.model edu/stanford/nlp/models/pos-tagger/english-caseless-left3words-distsim.tagger
-parse.model edu/stanford/nlp/models/lexparser/englishPCFG.caseless.ser.gz
-ner.model edu/stanford/nlp/models/ner/english.all.3class.caseless.distsim.crf.ser.gz
edu/stanford/nlp/models/ner/english.muc.7class.caseless.distsim.crf.ser.gz
edu/stanford/nlp/models/ner/english.conll.4class.caseless.distsim.crf.ser.gz
在您的代码中,这些路径可以设置为:
props.put("pos.model", "edu/stanford/nlp/models/pos-tagger/english-caseless-left3words-distsim.tagger");
props.put("parse.model", "edu/stanford/nlp/models/lexparser/englishPCFG.caseless.ser.gz");
props.put("ner.model", "edu/stanford/nlp/models/ner/english.all.3class.caseless.distsim.crf.ser.gz edu/stanford/nlp/models/ner/english.muc.7class.caseless.distsim.crf.ser.gz edu/stanford/nlp/models/ner/english.conll.4class.caseless.distsim.crf.ser.gz");
我已经实现了使用 stanford nlp 从特定英文句子中获取位置的逻辑。我正在使用以下罐子 stanford-corenlp-3.2.0.jar stanford-corenlp-3.2.0-models.jar
我写的逻辑如下
public static edu.stanford.nlp.pipeline.StanfordCoreNLP snlp;
/**
* @see ServletContextListener#contextInitialized(ServletContextEvent)
*/
public void contextInitialized(ServletContextEvent arg0) {
Properties props = new Properties();
props.put("annotators", "tokenize,ssplit,pos,lemma,parse,ner,dcoref");
StanfordCoreNLP snlp = new StanfordCoreNLP(props);
}
但是由于区分大小写的问题,我被建议使用 stanford-corenlp-caseless-2015-04-20-models.jar 而不是 stanford-corenlp-3.2.0.jar。 从上面的代码中,默认加载的 jar 是 stanford-corenlp-3.2.0-models.jar.
但是我现在想使用以下模型进行配置,即 stanford-corenlp-caseless-2015-04-20-models.jar 请指导我如何使用 java 代码配置它。
我尝试了 Gabor 的解决方案。但是我得到了以下异常
SEVERE: Exception sending context initialized event to listener instance of class servlets.NLP_initializer
java.lang.RuntimeException: edu.stanford.nlp.io.RuntimeIOException: Unrecoverable error while loading a tagger model
at edu.stanford.nlp.pipeline.StanfordCoreNLP.create(StanfordCoreNLP.java:493)
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:81)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:260)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:127)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:123)
at servlets.NLP_initializer.contextInitialized(NLP_initializer.java:34)
at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4887)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5381)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1559)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1549)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: edu.stanford.nlp.io.RuntimeIOException: Unrecoverable error while loading a tagger model
at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:749)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:283)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:247)
at edu.stanford.nlp.pipeline.POSTaggerAnnotator.loadModel(POSTaggerAnnotator.java:78)
at edu.stanford.nlp.pipeline.POSTaggerAnnotator.<init>(POSTaggerAnnotator.java:62)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.create(StanfordCoreNLP.java:491)
... 14 more
Caused by: java.io.IOException: Unable to resolve "edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger" as either class path, filename or URL
at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:419)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:744)
... 19 more
见http://nlp.stanford.edu/software/corenlp.shtml#caseless
从文档中复制:
It is possible to run StanfordCoreNLP with tagger, parser, and NER models that ignore capitalization. In order to do this, download the caseless models package. Be sure to include the path to the case insensitive models jar in the -cp classpath flag as well. Then, set properties which point to these models as follows:
-pos.model edu/stanford/nlp/models/pos-tagger/english-caseless-left3words-distsim.tagger
-parse.model edu/stanford/nlp/models/lexparser/englishPCFG.caseless.ser.gz
-ner.model edu/stanford/nlp/models/ner/english.all.3class.caseless.distsim.crf.ser.gz edu/stanford/nlp/models/ner/english.muc.7class.caseless.distsim.crf.ser.gz edu/stanford/nlp/models/ner/english.conll.4class.caseless.distsim.crf.ser.gz
在您的代码中,这些路径可以设置为:
props.put("pos.model", "edu/stanford/nlp/models/pos-tagger/english-caseless-left3words-distsim.tagger");
props.put("parse.model", "edu/stanford/nlp/models/lexparser/englishPCFG.caseless.ser.gz");
props.put("ner.model", "edu/stanford/nlp/models/ner/english.all.3class.caseless.distsim.crf.ser.gz edu/stanford/nlp/models/ner/english.muc.7class.caseless.distsim.crf.ser.gz edu/stanford/nlp/models/ner/english.conll.4class.caseless.distsim.crf.ser.gz");