使用 nlp 提取年龄相关信息
Extracting age related information from using nlp
我是 NLP 新手,我一直在尝试从原始文本中提取与年龄相关的信息。我用谷歌搜索并没有得到任何语言的可靠库来满足这个要求。如果我能在这方面得到任何帮助,那就太好了。我对任何语言都持开放态度,这不是一种约束。它也可以是 Java、Python 或任何其他语言。任何帮助将非常感激。提前致谢。干杯!
更新:
我尝试将斯坦福帮助中提到的注释器添加到我的 java 解析器中,但我遇到了以下异常:
ERROR: cannot create CorefAnnotator!
java.lang.RuntimeException: Error creating coreference system
at
edu.stanford.nlp.scoref.StatisticalCorefSystem.fromProps(StatisticalCorefSystem.java:58)
at edu.stanford.nlp.pipeline.CorefAnnotator.<init>(CorefAnnotator.java:66)
at edu.stanford.nlp.pipeline.AnnotatorImplementations.coref(AnnotatorImplementations.java:220)
at edu.stanford.nlp.pipeline.AnnotatorFactories.create(AnnotatorFactories.java:515)
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:85)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:375)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:139)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:135)
at com.dateparser.SUtime.SUAgeParser.makeNumericPipeline(SUAgeParser.java:85)
at com.dateparser.SUtime.SUAgeParser.<clinit>(SUAgeParser.java:60)
Caused by: java.lang.RuntimeException: Error initializing coref system
at edu.stanford.nlp.scoref.StatisticalCorefSystem.<init>(StatisticalCorefSystem.java:36)
at edu.stanford.nlp.scoref.ClusteringCorefSystem.<init>(ClusteringCorefSystem.java:24)
at edu.stanford.nlp.scoref.StatisticalCorefSystem.fromProps(StatisticalCorefSystem.java:48)
... 9 more
Caused by: java.io.IOException: Unable to open "edu/stanford/nlp/models/hcoref/md-model.ser" as class path, filename or URL
at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:485)
at edu.stanford.nlp.io.IOUtils.readObjectFromURLOrClasspathOrFileSystem(IOUtils.java:323)
at edu.stanford.nlp.hcoref.md.DependencyCorefMentionFinder.<init>(DependencyCorefMentionFinder.java:38)
at edu.stanford.nlp.hcoref.CorefDocMaker.getMentionFinder(CorefDocMaker.java:149)
at edu.stanford.nlp.hcoref.CorefDocMaker.<init>(CorefDocMaker.java:61)
at edu.stanford.nlp.scoref.StatisticalCorefSystem.<init>(StatisticalCorefSystem.java:34)
... 11 more
我升级到 1.6.0 版,并将 stanford-corenlp-models-current.jar 添加到类路径中。如果我遗漏了什么,请告诉我
更新 1:
升级到 3.9.1 后,异常已修复。但是我得到的输出是 per:duration 关系而不是 per:age
private static AnnotationPipeline makePipeline() {
Properties props = new Properties();
props.setProperty("annotators",
"tokenize,ssplit,pos,lemma,ner,depparse,coref,kbp");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
return pipeline;
}
public static void parse(String str) {
try {
Annotation doc = new Annotation(str);
pipeline.annotate(doc);
ArrayList<CoreMap> resultRelations = new ArrayList<CoreMap>();
List<CoreMap> mentionsAnnotations = doc.get(MentionsAnnotation.class);
for (CoreMap currentCoreMap : mentionsAnnotations) {
System.out.println(currentCoreMap.get(TextAnnotation.class));
System.out.println(currentCoreMap.get(CharacterOffsetBeginAnnotation.class));
System.out.println(currentCoreMap.get(CharacterOffsetEndAnnotation.class));
System.out.println(currentCoreMap.get(NamedEntityTagAnnotation.class));
}
} catch (Exception e) {
}
}
这是正常行为还是我做错了什么?
您可能会发现 KBP 关系提取器很有用。
示例文本:
Joe Smith is 58 years old.
命令:
java -Xmx12g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,depparse,coref,kbp -file example.txt -outputFormat text
这应该通过 per:age
关系将 Joe Smith
附加到 58 years old
。
我是 NLP 新手,我一直在尝试从原始文本中提取与年龄相关的信息。我用谷歌搜索并没有得到任何语言的可靠库来满足这个要求。如果我能在这方面得到任何帮助,那就太好了。我对任何语言都持开放态度,这不是一种约束。它也可以是 Java、Python 或任何其他语言。任何帮助将非常感激。提前致谢。干杯!
更新:
我尝试将斯坦福帮助中提到的注释器添加到我的 java 解析器中,但我遇到了以下异常:
ERROR: cannot create CorefAnnotator!
java.lang.RuntimeException: Error creating coreference system
at
edu.stanford.nlp.scoref.StatisticalCorefSystem.fromProps(StatisticalCorefSystem.java:58)
at edu.stanford.nlp.pipeline.CorefAnnotator.<init>(CorefAnnotator.java:66)
at edu.stanford.nlp.pipeline.AnnotatorImplementations.coref(AnnotatorImplementations.java:220)
at edu.stanford.nlp.pipeline.AnnotatorFactories.create(AnnotatorFactories.java:515)
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:85)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:375)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:139)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:135)
at com.dateparser.SUtime.SUAgeParser.makeNumericPipeline(SUAgeParser.java:85)
at com.dateparser.SUtime.SUAgeParser.<clinit>(SUAgeParser.java:60)
Caused by: java.lang.RuntimeException: Error initializing coref system
at edu.stanford.nlp.scoref.StatisticalCorefSystem.<init>(StatisticalCorefSystem.java:36)
at edu.stanford.nlp.scoref.ClusteringCorefSystem.<init>(ClusteringCorefSystem.java:24)
at edu.stanford.nlp.scoref.StatisticalCorefSystem.fromProps(StatisticalCorefSystem.java:48)
... 9 more
Caused by: java.io.IOException: Unable to open "edu/stanford/nlp/models/hcoref/md-model.ser" as class path, filename or URL
at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:485)
at edu.stanford.nlp.io.IOUtils.readObjectFromURLOrClasspathOrFileSystem(IOUtils.java:323)
at edu.stanford.nlp.hcoref.md.DependencyCorefMentionFinder.<init>(DependencyCorefMentionFinder.java:38)
at edu.stanford.nlp.hcoref.CorefDocMaker.getMentionFinder(CorefDocMaker.java:149)
at edu.stanford.nlp.hcoref.CorefDocMaker.<init>(CorefDocMaker.java:61)
at edu.stanford.nlp.scoref.StatisticalCorefSystem.<init>(StatisticalCorefSystem.java:34)
... 11 more
我升级到 1.6.0 版,并将 stanford-corenlp-models-current.jar 添加到类路径中。如果我遗漏了什么,请告诉我
更新 1:
升级到 3.9.1 后,异常已修复。但是我得到的输出是 per:duration 关系而不是 per:age
private static AnnotationPipeline makePipeline() {
Properties props = new Properties();
props.setProperty("annotators",
"tokenize,ssplit,pos,lemma,ner,depparse,coref,kbp");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
return pipeline;
}
public static void parse(String str) {
try {
Annotation doc = new Annotation(str);
pipeline.annotate(doc);
ArrayList<CoreMap> resultRelations = new ArrayList<CoreMap>();
List<CoreMap> mentionsAnnotations = doc.get(MentionsAnnotation.class);
for (CoreMap currentCoreMap : mentionsAnnotations) {
System.out.println(currentCoreMap.get(TextAnnotation.class));
System.out.println(currentCoreMap.get(CharacterOffsetBeginAnnotation.class));
System.out.println(currentCoreMap.get(CharacterOffsetEndAnnotation.class));
System.out.println(currentCoreMap.get(NamedEntityTagAnnotation.class));
}
} catch (Exception e) {
}
}
这是正常行为还是我做错了什么?
您可能会发现 KBP 关系提取器很有用。
示例文本:
Joe Smith is 58 years old.
命令:
java -Xmx12g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,depparse,coref,kbp -file example.txt -outputFormat text
这应该通过 per:age
关系将 Joe Smith
附加到 58 years old
。