使用 Ruta,获取下一行注释关键字中的数据
Using Ruta, get a data present in next line of annotated keyword
如何获取数据 below/above 注释关键字出现在其他行?我可以注释关键字但无法获取信息
示例文本:
Underwriter's Name Appraiser's Name Appraisal Company Name
Alice Wheaton Cooper Bruce Banner Stark Industries
代码
TYPESYSTEM utils.PlainTextTypeSystem;
ENGINE utils.PlainTextAnnotator;
EXEC(PlainTextAnnotator, {Line});
ADDRETAINTYPE(WS);
Line{->TRIM(WS)};
REMOVERETAINTYPE(WS);
Document{->FILTERTYPE(SPECIAL)};
DECLARE UnderWriterKeyword, NameKeyword, UnderWriterNameKeyword;
DECLARE UnderWriterName(String label, String value);
CW{REGEXP("\bUnderwriter") -> UnderWriterKeyword};
CW{REGEXP("Name")->NameKeyword};
(UnderWriterKeyword SW NameKeyword){->UnderWriterNameKeyword};
ADDRETAINTYPE(SPACE);
Line{CONTAINS(UnderWriterNameKeyword)} Line -> {
(CW SPACE)+ {-> MARK(UnderWriterName)};
};
REMOVERETAINTYPE(SPACE)
预期输出:
Underwriter's Name: Alice Wheaton Cooper
Appraiser's Name: Bruce Banner
Appraisal Company Name: Stark Industries
请建议在 RUTA 中是否可行?如果为真,如何获取数据?
看看 Ruta 中的 inlined rules。
您可以在一行中定义您的条件,并在下一行中注释所需的信息:
Line{CONTAINS(UserName)} Line -> { //your logic here };
TYPESYSTEM utils.PlainTextTypeSystem;
ENGINE utils.PlainTextAnnotator;
DECLARE Header;
DECLARE ColumnDelimiter;
DECLARE Cell(INT column);
DECLARE Keyword (STRING label);
DECLARE Keyword UnderWriterNameKeyword, AppraiserNameLicenseKeyword,
AppraisalCompanyNameKeyword;
"Underwriter's Name" -> UnderWriterNameKeyword ( "label" = "UnderWriter
Name");
"Appraiser's Name/License" -> AppraiserNameLicenseKeyword ( "label" =
"Appraiser Name");
"Appraisal Company Name" -> AppraisalCompanyNameKeyword ( "label" =
"Appraisal Company Name");
DECLARE Entry(Keyword keyword);
EXEC(PlainTextAnnotator, {Line,Paragraph});
ADDRETAINTYPE(WS);
Line{->TRIM(WS)};
Paragraph{->TRIM(WS)};
SPACE[3,100]{-PARTOF(ColumnDelimiter) -> ColumnDelimiter};
Line -> {ANY+{-PARTOF(Cell),-PARTOF(ColumnDelimiter) -> Cell};};
REMOVERETAINTYPE(WS);
INT index = 0;
BLOCK(structure) Line{}{
ASSIGN(index, 0);
Line{STARTSWITH(Paragraph) -> Header};
c:Cell{-> c.column = index, index = index + 1};
}
Header<-{hc:Cell{hc.column == c.column}<-{k:Keyword;};}
# c:@Cell{-PARTOF(Header) -> e:Entry, e.keyword = k};
DECLARE Entity (STRING label, STRING value);
DECLARE Entity UnderWriterName, AppraiserNameLicense, AppraisalCompanyName;
FOREACH(entry) Entry{}{
entry{ -> CREATE(UnderWriterName, "label" = k.label, "value" =
entry.ct)}<-{k:entry.keyword{PARTOF(UnderWriterNameKeyword)};};
entry{ -> CREATE(AppraiserNameLicense, "label" = k.label, "value" =
entry.ct)}<-{k:entry.keyword{PARTOF(AppraiserNameLicenseKeyword)};};
entry{ -> CREATE(AppraisalCompanyName, "label" = k.label, "value" =
entry.ct)}<-{k:entry.keyword{PARTOF(AppraisalCompanyNameKeyword)};};
}
最重要的规则如下:
Header<-{hc:Cell{hc.column == c.column}<-{k:Keyword;};}
# c:@Cell{-PARTOF(Header) -> e:Entry, e.keyword = k};
它包含三个规则元素,Header
、#
和 Cell
,并且这样工作:
- 规则开始与
Cell
规则元素匹配,因为它被标记为带有 @
的锚点。
- 此规则元素匹配所有
Cell
注释,无论是否属于 Header
注释。它从满足此条件的第一个 Cell
注释开始,并将其称为 "c".
- 下一个规则元素是
#
,它一直匹配到下一个规则元素能够匹配为止。
- 如果内联规则能够在
Header
注释范围内匹配,则下一个规则元素匹配 Header
注释。内联规则匹配在此范围内记为 "hc" 的 Cell
注释,这些注释对特征 column
具有相同的值。匹配成功,如果它包含一个 Keyword
记住为 "k".
- 如果这三个规则元素匹配成功,则会应用操作。
- 第一个操作在
Cell
注释的跨度上创建一个名为 "e" 的 Entry
注释。
- 第二个操作将关键字分配给
Entry
功能 keyword
。
总而言之,该规则为不属于 header 的每个 Cell
注释创建一个 Entry
注释,并分配相应的 header 关键字列以定义条目的类型。
如何获取数据 below/above 注释关键字出现在其他行?我可以注释关键字但无法获取信息
示例文本:
Underwriter's Name Appraiser's Name Appraisal Company Name
Alice Wheaton Cooper Bruce Banner Stark Industries
代码
TYPESYSTEM utils.PlainTextTypeSystem;
ENGINE utils.PlainTextAnnotator;
EXEC(PlainTextAnnotator, {Line});
ADDRETAINTYPE(WS);
Line{->TRIM(WS)};
REMOVERETAINTYPE(WS);
Document{->FILTERTYPE(SPECIAL)};
DECLARE UnderWriterKeyword, NameKeyword, UnderWriterNameKeyword;
DECLARE UnderWriterName(String label, String value);
CW{REGEXP("\bUnderwriter") -> UnderWriterKeyword};
CW{REGEXP("Name")->NameKeyword};
(UnderWriterKeyword SW NameKeyword){->UnderWriterNameKeyword};
ADDRETAINTYPE(SPACE);
Line{CONTAINS(UnderWriterNameKeyword)} Line -> {
(CW SPACE)+ {-> MARK(UnderWriterName)};
};
REMOVERETAINTYPE(SPACE)
预期输出:
Underwriter's Name: Alice Wheaton Cooper
Appraiser's Name: Bruce Banner
Appraisal Company Name: Stark Industries
请建议在 RUTA 中是否可行?如果为真,如何获取数据?
看看 Ruta 中的 inlined rules。
您可以在一行中定义您的条件,并在下一行中注释所需的信息:
Line{CONTAINS(UserName)} Line -> { //your logic here };
TYPESYSTEM utils.PlainTextTypeSystem;
ENGINE utils.PlainTextAnnotator;
DECLARE Header;
DECLARE ColumnDelimiter;
DECLARE Cell(INT column);
DECLARE Keyword (STRING label);
DECLARE Keyword UnderWriterNameKeyword, AppraiserNameLicenseKeyword,
AppraisalCompanyNameKeyword;
"Underwriter's Name" -> UnderWriterNameKeyword ( "label" = "UnderWriter
Name");
"Appraiser's Name/License" -> AppraiserNameLicenseKeyword ( "label" =
"Appraiser Name");
"Appraisal Company Name" -> AppraisalCompanyNameKeyword ( "label" =
"Appraisal Company Name");
DECLARE Entry(Keyword keyword);
EXEC(PlainTextAnnotator, {Line,Paragraph});
ADDRETAINTYPE(WS);
Line{->TRIM(WS)};
Paragraph{->TRIM(WS)};
SPACE[3,100]{-PARTOF(ColumnDelimiter) -> ColumnDelimiter};
Line -> {ANY+{-PARTOF(Cell),-PARTOF(ColumnDelimiter) -> Cell};};
REMOVERETAINTYPE(WS);
INT index = 0;
BLOCK(structure) Line{}{
ASSIGN(index, 0);
Line{STARTSWITH(Paragraph) -> Header};
c:Cell{-> c.column = index, index = index + 1};
}
Header<-{hc:Cell{hc.column == c.column}<-{k:Keyword;};}
# c:@Cell{-PARTOF(Header) -> e:Entry, e.keyword = k};
DECLARE Entity (STRING label, STRING value);
DECLARE Entity UnderWriterName, AppraiserNameLicense, AppraisalCompanyName;
FOREACH(entry) Entry{}{
entry{ -> CREATE(UnderWriterName, "label" = k.label, "value" =
entry.ct)}<-{k:entry.keyword{PARTOF(UnderWriterNameKeyword)};};
entry{ -> CREATE(AppraiserNameLicense, "label" = k.label, "value" =
entry.ct)}<-{k:entry.keyword{PARTOF(AppraiserNameLicenseKeyword)};};
entry{ -> CREATE(AppraisalCompanyName, "label" = k.label, "value" =
entry.ct)}<-{k:entry.keyword{PARTOF(AppraisalCompanyNameKeyword)};};
}
最重要的规则如下:
Header<-{hc:Cell{hc.column == c.column}<-{k:Keyword;};}
# c:@Cell{-PARTOF(Header) -> e:Entry, e.keyword = k};
它包含三个规则元素,Header
、#
和 Cell
,并且这样工作:
- 规则开始与
Cell
规则元素匹配,因为它被标记为带有@
的锚点。 - 此规则元素匹配所有
Cell
注释,无论是否属于Header
注释。它从满足此条件的第一个Cell
注释开始,并将其称为 "c". - 下一个规则元素是
#
,它一直匹配到下一个规则元素能够匹配为止。 - 如果内联规则能够在
Header
注释范围内匹配,则下一个规则元素匹配Header
注释。内联规则匹配在此范围内记为 "hc" 的Cell
注释,这些注释对特征column
具有相同的值。匹配成功,如果它包含一个Keyword
记住为 "k". - 如果这三个规则元素匹配成功,则会应用操作。
- 第一个操作在
Cell
注释的跨度上创建一个名为 "e" 的Entry
注释。 - 第二个操作将关键字分配给
Entry
功能keyword
。
总而言之,该规则为不属于 header 的每个 Cell
注释创建一个 Entry
注释,并分配相应的 header 关键字列以定义条目的类型。