Marklogic 语义:rdf:about 属性值的处理

Marklogic semantics: treatment of rdf:about attribute values

我正在尝试使用 mlcp 1.3.3 将 RDF 数据源 (RDF/XML) 导入 Marklogic 8.02。

在导入过程中,我被这样的警告淹没了:

15/06/29 15:03:58 WARN contentpump.RDFReader: 57fad317-4744-4f88-a8f7-6c21c662ad08.rdf: {W107} Bad URI: Code: 45/UNREGISTERED_NONIETF_SCHEME_TREE in SCHEME: The scheme name has a "-" in it, but it does not start in "x-" and the prefix is not known as the prefix of an alternative tree for URI schemes.

查看源数据(RDF/XML),是这样的语句造成的:

<rdf:Description
rdf:about="rvr-jurisprudentie:http%3A%2F%2Flinkeddata.overheid.nl%2Fterms%2Fjurisprudentie%2Fid%2FECLI%3ANL%3ARVS%3A2013%3A549:http%3A%2F%2Flinkeddata.overheid.nl%2Fterms%2Fbwb%2Fid%2FBWBR0005181%2F2986364%2F2015-01-01%2F2015-01-01">

所以看起来 ML 认为这个 rdf:about 属性包含一个 URI,并抱怨它不是一个有效的 URI。

所以三个问题:

  1. 为什么 ML 认为这必须是一个 URI。我还没有遇到其他工具集的这个问题。
  2. 是否有忽略警告的开关(终端中的管道输出似乎不起作用)
  3. 这是否会对未来产生任何进一步的影响(即性能等)?

why does ML think that this has to be a URI. I haven't encountered this problem with other toolsets.

因为那是 RDF/XML syntax specification 所说的:

aboutAttr = 
      attribute rdf:about { 
          URI-reference 
      }

rdf:about 属性需要接收一个 URI,而您的数据确实有一些看起来很像该属性中的 URI 的内容:

rvr-jurisprudentie:http%3A%2F%2Flinkeddata.overheid.nl%2Fterms%2Fjurisprudentie%2Fid%2FECLI%3ANL%3ARVS%3A2013%3A549:http%3A%2F%2Flinkeddata.overheid.nl%2Fterms%2Fbwb%2Fid%2FBWBR0005181%2F2986364%2F2015-01-01%2F2015-01-01

它正在使用自定义 URI 方案,但它不完全符合 RFC 3986 Generic URI Syntax,因此 MarkLogic 会发出警告。但是,如果 some/all 工具可能无法解释它,它仍然是一个有效的 URI。

is there a switch with which to ignore warnings (Piping output in terminal doesn't seem to work)

似乎没有,但在 MarkLogic Content Pump documentation 中显示了启用 DEBUG 级别的消息:

Edit the file MLCP_INSTALL_DIR/conf/log4j.properties. For example, if mlcp is installed in /opt/mlcp, edit /opt/mlcp/conf/log4j.properties. In log4j.properties, set the properties log4j.logger.com.marklogic.mapreduce and log4j.logger.com.marklogic.contentpump to DEBUG. For example, include the following:

log4j.logger.com.marklogic.mapreduce=DEBUG
log4j.logger.com.marklogic.contentpump=DEBUG

显然这与您想要的相反,但由于它只是一个 log4j 配置,您可以将日志记录调低到 ERROR,就像将其调高到 DEBUG 一样如他们的示例所示。

does this have any further effects down the road (i.e. performance etc.)?

如果您需要将数据传递给其他更严格地解释 URI 规范的 RDF 或 SPARQL 工具,您可能会遇到问题