在发送给 Vespa 之前，是否应该从文档中删除 ASCII 控制字符？

Question

我正在尝试使用字符串字段将文档存储到 Vespa 中。使用 document-api http 端点时，它因解析错误而被拒绝。我已验证发送的 JSON 正确（其他文件正常）。

这是我看到的错误消息：

PARSER_ERROR Error in document 'id:x:y:n=1:1FVzo2l7mMLticB0WMkBKIECMLzAg' - could not parse field 'content' of type 'string': The string field value contains illegal code point 0xB

我可以看到在 allowedAsciiChars 中检查了这些类型的字符（在我的例子中是垂直制表符）com.yahoo.text.Text，但是我在文档中没有看到我应该剥离的任何地方这些字符在发送给 Vespa 之前。事实上，我看到了一种相反的情况，Vespa 会不遗余力地在幕后替换某些字符而不拒绝它们。

Answer 1

进纸前请去除文档中的 ASCII 控制字符。

我会更新文档，尽管 JSON spec 似乎说这些控制字符必须转义，所以这些在提要中隐式不允许

Answer 2

I see sort of the opposite situation where Vespa will go out of its way to replace certain chars behind the scenes

你在哪里看到的？

在 Java 中有一个 Text.stripInvalidCharacters 实用程序方法作为实用程序提供给客户端，它需要从未清理的文本中去除字符。

在发送给 Vespa 之前，是否应该从文档中删除 ASCII 控制字符？

Should ASCII control characters be stripped from documents before sending to Vespa?

vespa