FirstToken 未找到一些参考-UIMA RUTA

FirstToken is not found for some reference-UIMA RUTA

FirstToken 未找到某些参考(末尾包含 space)。

脚本:

DECLARE FirstToken, LastToken;

BLOCK(InRef) Reference{}{
    ANY{POSITION(Reference,1) -> MARK(FirstToken)};
    Document{-> MARKLAST(LastToken)};
}

输入文件:

1.  Ferreira, F.R., Prado, S.D., Carvalho, M.C, and Kraemer, F.B. (2015). Biopower and biopolitics in the field of food and nutrition. Revista de Nutrição, 28(1), 109-119. Available at http://dx.doi.org/10.1590/1415-52732015000100010. 
2.  Ali, S. (2007). Feminism and postcolonialism: Knowledge/politics. Ethnic and Racial Studies, 30(2), 191–212.  
3.  Forbes, D.A., King, K.M., Kushner, K.E., Letourneau, N.L., Myrick, A.F., and Profetto-McGrath, J. (1999). Warrantable evidence in nursing science. Journal of Advanced Nursing, 29(2), 373–379.

以不可见的内容开始或结束的注释也不可见。这个定义可能听起来不直观,但对于顺序匹配是必需的。

如果某些注释以 space 开头或结尾,这种情况最常发生。建议remove/trim这些space来自注释,例如:

RETAINTYPE(WS); // or RETAINTYPE(SPACE, BREAK,...);
Reference{-> TRIM(WS)};
RETAINTYPE;

如果您使 space 可见,您还可以处理以 space 结尾的注释:

RETAINTYPE(SPACE);

除此之外,您还可以像 MARKLAST 操作一样使用 MARKFIRST 操作,而不是 POSITION 条件,后者非常慢。

免责声明:我是 UIMA Ruta 的开发者