如何使用 IBM Content Analytics 在自定义 uima 注释器中访问文档文件名或 URL？

Question

我正在为 Watson Explorer Content Analytics 中的 UIMA 管道编写自定义 java 注释器。

有两个地方（据我所知）可以尝试获取当前正在处理的文档的 URL 或文件名。

初始化

public class CustomAnnotator extends JCasAnnotator_ImplBase {

@Override
public void initialize(UimaContext aContext)
        throws ResourceInitializationException {
    super.initialize(aContext);
.... HERE MAYBE ? ....

或

进程

@Override
public void process(JCas jcas) throws AnalysisEngineProcessException {
    try {
.... HERE ....

我尝试了几种选择：

通过初始化方法中的上下文（运行服务器上的管道，例如我可以获得 PearID），
通过过程方法中的 Sofa（例如 jcas.getSofa().getSofaURI()）

我也找到了 SourceDocumentInformation ，但这是一个例子，虽然方法 getUri() 看起来很有希望，但我依赖 IBM 来实现 setUri(String) 方法...

但是到目前为止我还没有成功，我希望我忽略了一些东西...

Answer 1

我在 IBM dwanwsers 上问过同样的问题。简而言之，当管道在 Watson Explorer Content Analytics 服务器中运行时，您可以访问多个视图。对于元数据，我们需要检查 _InitialView 而不是 rlw-view，后者包含您在 Content Analytics Studio 中创建的自定义管道创建的所有注释可以在此处找到更多详细信息，也可以查看回复！ https://www.ibm.com/developerworks/community/blogs/ibmandgoogle/entry/Exporting_annotations_from_Watson_Explorer_Content_Analytics?lang=en

如何使用 IBM Content Analytics 在自定义 uima 注释器中访问文档文件名或 URL？

How can I access document filename or URL in custom uima annotator using IBM Content Analytics?

java

uima

watson-explorer