尝试在 MarkLogic 中存储二进制内容时出现 XDMP-DOCUTF8SEQ

XDMP-DOCUTF8SEQ when trying to store binary content in MarkLogic

我的情况与之前 post 中提到的情况略有不同,我推送的内容实际上只是想在 MarkLogic 中存储为二进制文件。我稍后在触发器中有代码来处理文件的内容。上传的相关内容的 URI 以 .txt 结尾。

使用 Java API,我有:

    BinaryDocumentManager docManager = binaryClient.newBinaryDocumentManager();
    BinaryWriteHandle handle = new BytesHandle(content).withFormat(Format.BINARY);

我希望这会绕过 UTF-8 要求。我的假设正确吗?

 Server Message: XDMP-DOCUTF8SEQ: Invalid UTF-8 escape sequence at  line 1 -- document is not UTF-8 encoded

Java API 通过 REST API,当您调用 /v1/documents PUT 插入文档时会发生一些 auto-magic 处理.

如果URI 具有已知的文件扩展名,则MIME 类型映射来确定格式。当您使用带有 .txt 文件扩展名的 URI 时,它假定您正在加载文本文档。

如果您要使用不以 .txt 文件扩展名结尾的 URI,例如 .txt.bin,那么它应该作为 binary() 节点插入。

如果您想将文件扩展名为 .txt 的文档作为 binary() 节点插入,则可能需要以不同的方式插入。

General Content Type Guidelines

The following guidelines apply to specifying input and output content type for most requests:

  • Document content: Rely on the MarkLogic Server MIME type mapping defined for the URI extension.
  • Non-document data: Set the request Content-type and/or Accept headers. In most cases, this means setting the header(s) to application/xml or application/json.

The installation-wide MarkLogic Server MIME type mappings define associations between MIME type, URI extensions, and document format. For example, the default mappings associate the MIME type application/pdf, the 'pdf' URI extension, and the binary document format. You can view, change, and extend the mappings in the 'Mimetypes' section of the Admin Interface or using the XQuery functions admin:mimetypes-get and admin:mimetypes-add.