需要配合标注特征-UIMA RUTA
Need to match the annotation feature-UIMA RUTA
我需要匹配一个标注的特征,还需要标记匹配特征的第二个标注。我试过了,但我面临两个问题
问题 1:
SEPERATEDA annotation values got reduced.I think its due to dictRemoveWS.
问题 2:
It showing only the last match.(Due to some looping problem).
示例文件 1:
Arash Alipour
Rahul Bhargava
Lisette I.S. Wintgens
B. Rahul
Alipour A
Ali Aldabahi
M. Naziruddin Khan
Martin J. Swaans
Naziruddin Khan
文件 1 的预期输出:
Rahul
Alipour
Naziruddin
Khan
示例文件 2:
M. Naziruddin Khan
Arash Alipour
Rahul Bhargava
Lisette I.S. Wintgens
Alipour A
Ali Aldabahi
M. Naziruddin Khan
文件 2 的预期输出:
Alipour
Naziruddin
Khan
我的脚本:
PACKAGE uima.ruta.example;
DECLARE SINGLEINITIAL;
CW{REGEXP(".")->MARK(SINGLEINITIAL)};
DECLARE SeperateDA;
DECLARE DA;
"Arash Alipour"->DA;
"Lisette I.S. Wintgens"->DA;
"Alipour A"->DA;
"Rahul Bhargava"->DA;
"M. Naziruddin Khan"->DA;
"B. Rahul"->DA;
"Ali Aldabahi"->DA;
"A. S. Al Dwayyan"->DA;
"Lucas V.A. Boersma"->DA;
"Jippe C. Bal"->DA;
"Benno J.W.M. Rensing"->DA;
"Martin J. Swaans"->DA;
BLOCK(DocAuth) DA{}
{
CW{-PARTOF(SINGLEINITIAL)-> MARK(SeperateDA)};
}
DECLARE RepeatedDA(STRING auth);
STRING MatchedAuth;
SeperateDA{->MARK(RepeatedDA),MATCHEDTEXT(MatchedAuth)}->{RepeatedDA{->RepeatedDA.auth=MatchedAuth};};
STRING auth;
FOREACH(RepAuth) RepeatedDA{}
{
(da1:RepeatedDA {->UNMARK(RepeatedDA)}# da2:RepeatedDA){da1.auth != da2.auth};
}
我也试过类似的方法
da:RepeatedDA{->da.auth = RepeatedDA.auth};
FOREACH(RepAuth, true) RepeatedDA{}
{
# da:RepeatedDA{->auth = da.auth, LOG(" auth-" +auth)};
da:RepeatedDA {auth != da.auth-> UNMARK(da)};
}
我的目标是从 DA 中删除更多相似的名称。例如,从上面的示例文件中,Rahul Bhargava 和 B. Rahul 都在 DA.But 我只需要 Rahul Bhargava 在 DA 中。
你的规则逻辑似乎有问题。
da1:RepeatedDA # da2:RepeatedDA
da2 始终直接匹配下一个 RepeatedDA/SeperateDA,因为身份验证功能的值不同。因此,该规则几乎每次都适用。
试试这个:
DECLARE SINGLEINITIAL;
CW{REGEXP(".")->MARK(SINGLEINITIAL)};
DECLARE SeperateDA (STRING auth);
DECLARE DA;
"Arash Alipour"->DA;
"Lisette I.S. Wintgens"->DA;
"Alipour A"->DA;
"Rahul Bhargava"->DA;
"M. Naziruddin Khan"->DA;
"B. Rahul"->DA;
"Ali Aldabahi"->DA;
"A. S. Al Dwayyan"->DA;
"Lucas V.A. Boersma"->DA;
"Jippe C. Bal"->DA;
"Benno J.W.M. Rensing"->DA;
"Martin J. Swaans"->DA;
BLOCK(DocAuth) DA{}
{
CW{-PARTOF(SINGLEINITIAL)-> CREATE(SeperateDA, "auth" = CW.ct)};
}
DECLARE RepeatedDA;
da1:SeperateDA{-> RepeatedDA} # da2:SeperateDA{da1.auth == da2.auth};
免责声明:我是 UIMA Ruta 的开发者
我需要匹配一个标注的特征,还需要标记匹配特征的第二个标注。我试过了,但我面临两个问题
问题 1:
SEPERATEDA annotation values got reduced.I think its due to dictRemoveWS.
问题 2:
It showing only the last match.(Due to some looping problem).
示例文件 1:
Arash Alipour
Rahul Bhargava
Lisette I.S. Wintgens
B. Rahul
Alipour A
Ali Aldabahi
M. Naziruddin Khan
Martin J. Swaans
Naziruddin Khan
文件 1 的预期输出:
Rahul
Alipour
Naziruddin
Khan
示例文件 2:
M. Naziruddin Khan
Arash Alipour
Rahul Bhargava
Lisette I.S. Wintgens
Alipour A
Ali Aldabahi
M. Naziruddin Khan
文件 2 的预期输出:
Alipour
Naziruddin
Khan
我的脚本:
PACKAGE uima.ruta.example;
DECLARE SINGLEINITIAL;
CW{REGEXP(".")->MARK(SINGLEINITIAL)};
DECLARE SeperateDA;
DECLARE DA;
"Arash Alipour"->DA;
"Lisette I.S. Wintgens"->DA;
"Alipour A"->DA;
"Rahul Bhargava"->DA;
"M. Naziruddin Khan"->DA;
"B. Rahul"->DA;
"Ali Aldabahi"->DA;
"A. S. Al Dwayyan"->DA;
"Lucas V.A. Boersma"->DA;
"Jippe C. Bal"->DA;
"Benno J.W.M. Rensing"->DA;
"Martin J. Swaans"->DA;
BLOCK(DocAuth) DA{}
{
CW{-PARTOF(SINGLEINITIAL)-> MARK(SeperateDA)};
}
DECLARE RepeatedDA(STRING auth);
STRING MatchedAuth;
SeperateDA{->MARK(RepeatedDA),MATCHEDTEXT(MatchedAuth)}->{RepeatedDA{->RepeatedDA.auth=MatchedAuth};};
STRING auth;
FOREACH(RepAuth) RepeatedDA{}
{
(da1:RepeatedDA {->UNMARK(RepeatedDA)}# da2:RepeatedDA){da1.auth != da2.auth};
}
我也试过类似的方法
da:RepeatedDA{->da.auth = RepeatedDA.auth};
FOREACH(RepAuth, true) RepeatedDA{}
{
# da:RepeatedDA{->auth = da.auth, LOG(" auth-" +auth)};
da:RepeatedDA {auth != da.auth-> UNMARK(da)};
}
我的目标是从 DA 中删除更多相似的名称。例如,从上面的示例文件中,Rahul Bhargava 和 B. Rahul 都在 DA.But 我只需要 Rahul Bhargava 在 DA 中。
你的规则逻辑似乎有问题。
da1:RepeatedDA # da2:RepeatedDA
da2 始终直接匹配下一个 RepeatedDA/SeperateDA,因为身份验证功能的值不同。因此,该规则几乎每次都适用。
试试这个:
DECLARE SINGLEINITIAL;
CW{REGEXP(".")->MARK(SINGLEINITIAL)};
DECLARE SeperateDA (STRING auth);
DECLARE DA;
"Arash Alipour"->DA;
"Lisette I.S. Wintgens"->DA;
"Alipour A"->DA;
"Rahul Bhargava"->DA;
"M. Naziruddin Khan"->DA;
"B. Rahul"->DA;
"Ali Aldabahi"->DA;
"A. S. Al Dwayyan"->DA;
"Lucas V.A. Boersma"->DA;
"Jippe C. Bal"->DA;
"Benno J.W.M. Rensing"->DA;
"Martin J. Swaans"->DA;
BLOCK(DocAuth) DA{}
{
CW{-PARTOF(SINGLEINITIAL)-> CREATE(SeperateDA, "auth" = CW.ct)};
}
DECLARE RepeatedDA;
da1:SeperateDA{-> RepeatedDA} # da2:SeperateDA{da1.auth == da2.auth};
免责声明:我是 UIMA Ruta 的开发者