Play Xml 解析器不处理 BOM（字节排序标记）

Question

我使用 Play Framework 2.8.0 编写了一个简单的 xml 端点：

def xmlEndpoint: Action[NodeSeq] = Action.async(parse.xml) ...

将其与旧客户端一起使用，该客户端发送具有 text/xml 内容类型的 POST 请求。这里的问题是请求正文以所谓的 BOM（字节排序标记）开头：

这是一个序列 7 3 7 表示正文中紧跟的是 UTF-8。 Play 检测到此前缀和 return 错误：

For request 'POST /xmla' [Invalid XML: Content is not allowed in prolog.]

我尝试在不解析的情况下获取请求，然后像这样删除前缀：

def xmla: Action[AnyContent] = Action({ implicit r: Request[AnyContent] => {
  val validXmlBOM: Option[NodeSeq] = r.body.asText
    .map(_.replace("ï»¿", ""))
    .map(scala.xml.XML.loadString)
  Ok(validXmlBOM.get.toString())
}})

但对我不起作用。 Play 仍然检测到主体是 xml 有效负载并尝试解析它。

有人能解决这个问题吗？也许是定制的解析器？

Answer 1

我刚刚完成了一个自定义解析器，该解析器从请求中删除 BOM 并将其解析为 NodeSeq 对象。这只是一个模型，但它给了你想法：

  def tolerantXmlBOM(maxLength: Long = 10000)(implicit executionContext: ExecutionContext): BodyParser[NodeSeq] =
BodyParser({ request: RequestHeader => {
  val sink: Sink[ByteString, Future[Either[Result, NodeSeq]]] = Flow[ByteString]
    .via(Framing.delimiter(ByteString("\n"), 1000, allowTruncation = true))
    .map(_.utf8String)
    .filterNot(_.startsWith("ï»¿"))
    .fold("") { case (acc, s) => acc + s }
    .map(s => {
      // todo should catch parsing error and return an error Result
      val parsed = scala.xml.XML.loadString(s)
      if (true) {
        Right(parsed)
      } else {
        Left(BadRequest("efefef"))
      }
    })
    .toMat(Sink.last)(Keep.right)
  Accumulator.apply(sink)
}
})

此外，我需要替换该转换 utf8String 并直接在字节级别工作，因为我的输入为 ByteString。

Play Xml 解析器不处理 BOM（字节排序标记）

Play Xml parser doesn't handle BOM (Byte Ordering Mark)

xml

playframework