将解析季度的自定义规则添加到 SUTime
Add custom rules for parsing quarters to SUTime
我正在关注 the official instructions 为财政年度季度添加自定义 SUTime 规则(例如 Q1、Q2、Q3 和 Q4)。
我使用默认的 defs.sutime.txt
和 english.sutime.txt
作为我自己的规则文件的模板。
将以下代码附加到我的 defs.sutime.txt
之后
// Financial Quarters
FYQ1 = {
type: QUARTER_OF_YEAR,
label: "FYQ1",
value: TimeWithRange(TimeRange(IsoDate(ANY,10,1), IsoDate(ANY,12,31), QUARTER))
}
FYQ2 = {
type: QUARTER_OF_YEAR,
label: "FYQ2",
value: TimeWithRange(TimeRange(IsoDate(ANY,1,1), IsoDate(ANY,3,31), QUARTER))
}
FYQ3 = {
type: QUARTER_OF_YEAR,
label: "FYQ3",
value: TimeWithRange(TimeRange(IsoDate(ANY,4,1), IsoDate(ANY,6,30), QUARTER))
}
FYQ4 = {
type: QUARTER_OF_YEAR,
label: "FYQ4",
value: TimeWithRange(TimeRange(IsoDate(ANY,7,1), IsoDate(ANY,9,30), QUARTER))
}
并将以下代码附加到我的 english.sutime.txt
# Financial Quarters
FISCAL_YEAR_QUARTER_MAP = {
"Q1": FYQ1,
"Q2": FYQ2,
"Q3": FYQ3,
"Q4": FYQ4
}
FISCAL_YEAR_QUARTER_YEAR_OFFSETS_MAP = {
"Q1": 1,
"Q2": 0,
"Q3": 0,
"Q4": 0
}
$FiscalYearQuarterTerm = CreateRegex(Keys(FISCAL_YEAR_QUARTER_MAP))
{
matchWithResults: TRUE,
pattern: ((/$FiscalYearQuarterTerm/) (FY)? (/(FY)?([0-9]{4})/)),
result: TemporalCompose(INTERSECT, IsoDate(Subtract({type: "NUMBER", value: $.matchResults[0].word.group(2)}, FISCAL_YEAR_QUARTER_YEAR_OFFSETS_MAP[[0].word]), ANY, ANY), FISCAL_YEAR_QUARTER_MAP[[0].word])
}
{
pattern: ((/$FiscalYearQuarterTerm/)),
result: FISCAL_YEAR_QUARTER_MAP[[0].word]
}
我仍然无法正确解析 "Q1 2020".
之类的内容
如何正确添加解析财政年度季度的规则(例如 "Q1")?
这是我的完整代码:
import java.util.List;
import java.util.Properties;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.time.*;
import edu.stanford.nlp.util.CoreMap;
public class SUTimeSoExample {
public static void main(String[] args) {
Properties props = new Properties();
props.setProperty("sutime.includeRange", "true");
props.setProperty("sutime.markTimeRanges", "true");
props.setProperty("sutime.rules", "./defs.sutime.txt,./english.sutime.txt");
AnnotationPipeline pipeline = new AnnotationPipeline();
pipeline.addAnnotator(new TokenizerAnnotator(false));
pipeline.addAnnotator(new WordsToSentencesAnnotator(false));
pipeline.addAnnotator(new POSTaggerAnnotator(false));
pipeline.addAnnotator(new TimeAnnotator("sutime", props));
String input = "Stuff for Q1 2020";
Annotation annotation = new Annotation(input);
annotation.set(CoreAnnotations.DocDateAnnotation.class, "2020-06-01");
pipeline.annotate(annotation);
System.out.println(annotation.get(CoreAnnotations.TextAnnotation.class));
List<CoreMap> timexAnnsAll = annotation.get(TimeAnnotations.TimexAnnotations.class);
for (CoreMap cm : timexAnnsAll) {
System.out.println(cm // match
+ " --> " + cm.get(TimeExpression.Annotation.class).getTemporal() // parsed value
);
}
}
}
请注意,我从 stanford corenlp 模型 JAR 中删除了默认的 defs.sutime.txt
和 english.sutime.txt
文件以避免 this issue.
此处有一个 Java 代码示例:
https://stanfordnlp.github.io/CoreNLP/sutime.html
如果您按照该示例进行操作,应该可以正常工作,主要是以这种方式构建您的管道:
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner");
props.setProperty("ner.docDate.usePresent", "true");
// this will shut off the statistical models if you only want to run SUTime only
props.setProperty("ner.rulesOnly", "true");
// add your sutime properties as in your example
...
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
并确保使用 4.0.0 版。
如果您只想 运行 SUTime 而不 运行 统计模型,您可以将 ner.rulesOnly
设置为 true。
您可以使用 ner.docDate
的几个属性之一,或者只在 运行ning 之前的注释中设置文档日期。
我正在关注 the official instructions 为财政年度季度添加自定义 SUTime 规则(例如 Q1、Q2、Q3 和 Q4)。
我使用默认的 defs.sutime.txt
和 english.sutime.txt
作为我自己的规则文件的模板。
将以下代码附加到我的 defs.sutime.txt
// Financial Quarters
FYQ1 = {
type: QUARTER_OF_YEAR,
label: "FYQ1",
value: TimeWithRange(TimeRange(IsoDate(ANY,10,1), IsoDate(ANY,12,31), QUARTER))
}
FYQ2 = {
type: QUARTER_OF_YEAR,
label: "FYQ2",
value: TimeWithRange(TimeRange(IsoDate(ANY,1,1), IsoDate(ANY,3,31), QUARTER))
}
FYQ3 = {
type: QUARTER_OF_YEAR,
label: "FYQ3",
value: TimeWithRange(TimeRange(IsoDate(ANY,4,1), IsoDate(ANY,6,30), QUARTER))
}
FYQ4 = {
type: QUARTER_OF_YEAR,
label: "FYQ4",
value: TimeWithRange(TimeRange(IsoDate(ANY,7,1), IsoDate(ANY,9,30), QUARTER))
}
并将以下代码附加到我的 english.sutime.txt
# Financial Quarters
FISCAL_YEAR_QUARTER_MAP = {
"Q1": FYQ1,
"Q2": FYQ2,
"Q3": FYQ3,
"Q4": FYQ4
}
FISCAL_YEAR_QUARTER_YEAR_OFFSETS_MAP = {
"Q1": 1,
"Q2": 0,
"Q3": 0,
"Q4": 0
}
$FiscalYearQuarterTerm = CreateRegex(Keys(FISCAL_YEAR_QUARTER_MAP))
{
matchWithResults: TRUE,
pattern: ((/$FiscalYearQuarterTerm/) (FY)? (/(FY)?([0-9]{4})/)),
result: TemporalCompose(INTERSECT, IsoDate(Subtract({type: "NUMBER", value: $.matchResults[0].word.group(2)}, FISCAL_YEAR_QUARTER_YEAR_OFFSETS_MAP[[0].word]), ANY, ANY), FISCAL_YEAR_QUARTER_MAP[[0].word])
}
{
pattern: ((/$FiscalYearQuarterTerm/)),
result: FISCAL_YEAR_QUARTER_MAP[[0].word]
}
我仍然无法正确解析 "Q1 2020".
之类的内容如何正确添加解析财政年度季度的规则(例如 "Q1")?
这是我的完整代码:
import java.util.List;
import java.util.Properties;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.time.*;
import edu.stanford.nlp.util.CoreMap;
public class SUTimeSoExample {
public static void main(String[] args) {
Properties props = new Properties();
props.setProperty("sutime.includeRange", "true");
props.setProperty("sutime.markTimeRanges", "true");
props.setProperty("sutime.rules", "./defs.sutime.txt,./english.sutime.txt");
AnnotationPipeline pipeline = new AnnotationPipeline();
pipeline.addAnnotator(new TokenizerAnnotator(false));
pipeline.addAnnotator(new WordsToSentencesAnnotator(false));
pipeline.addAnnotator(new POSTaggerAnnotator(false));
pipeline.addAnnotator(new TimeAnnotator("sutime", props));
String input = "Stuff for Q1 2020";
Annotation annotation = new Annotation(input);
annotation.set(CoreAnnotations.DocDateAnnotation.class, "2020-06-01");
pipeline.annotate(annotation);
System.out.println(annotation.get(CoreAnnotations.TextAnnotation.class));
List<CoreMap> timexAnnsAll = annotation.get(TimeAnnotations.TimexAnnotations.class);
for (CoreMap cm : timexAnnsAll) {
System.out.println(cm // match
+ " --> " + cm.get(TimeExpression.Annotation.class).getTemporal() // parsed value
);
}
}
}
请注意,我从 stanford corenlp 模型 JAR 中删除了默认的 defs.sutime.txt
和 english.sutime.txt
文件以避免 this issue.
此处有一个 Java 代码示例:
https://stanfordnlp.github.io/CoreNLP/sutime.html
如果您按照该示例进行操作,应该可以正常工作,主要是以这种方式构建您的管道:
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner");
props.setProperty("ner.docDate.usePresent", "true");
// this will shut off the statistical models if you only want to run SUTime only
props.setProperty("ner.rulesOnly", "true");
// add your sutime properties as in your example
...
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
并确保使用 4.0.0 版。
如果您只想 运行 SUTime 而不 运行 统计模型,您可以将 ner.rulesOnly
设置为 true。
您可以使用 ner.docDate
的几个属性之一,或者只在 运行ning 之前的注释中设置文档日期。