iText7 不接受@page 规则中的特殊 html 字符

Question

我正在按照 iText7 教程将一些 HTML 转换为带页脚的 PDF。似乎 CssRuleSetParser 由于 á:

之后的分号而中断

<html>
  <head>
    <style>
      @page {
        @bottom-right {
          content: "P&aacute;gina " counter(page) " de " counter(pages);
        }
      }
    </style>
  </head>
  <body>
    <p>Minha terra tem palmeiras<br/>Onde canta o sabi&aacute;<br/>As aves que aqui gorjeiam<br/>N&atilde;o gorjeiam como l&aacute;</p>
  </body>
</html>

正文中的特殊字符完美运行。

Java代码没有什么特别之处：

import java.io.ByteArrayOutputStream;
import java.io.IOException;

import org.springframework.http.MediaType;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.ResponseBody;
import org.springframework.web.bind.annotation.RestController;

import com.itextpdf.html2pdf.ConverterProperties;
import com.itextpdf.html2pdf.HtmlConverter;
import com.itextpdf.styledxmlparser.css.media.MediaDeviceDescription;
import com.itextpdf.styledxmlparser.jsoup.nodes.Document;

@RestController
@RequestMapping("/html2pdf")
public class Html2PdfController {

  @PostMapping(produces = MediaType.APPLICATION_PDF_VALUE)
  public @ResponseBody byte[] convert(@RequestBody String html) throws IOException {
    try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
      var converterProperties = new ConverterProperties(); 
      var mediaDeviceDescription = new MediaDeviceDescription(com.itextpdf.styledxmlparser.css.media.MediaType.PRINT);
      converterProperties.setMediaDeviceDescription(mediaDeviceDescription);
      HtmlConverter.convertToPdf(html, baos, converterProperties);
      return baos.toByteArray();
    }
  }
}

最好的方法是什么？我应该尝试“清除”CSS 中的特殊字符吗？ :-(

编辑：

[forward|back]也不接受斜杠。他们只是被忽略了：

@bottom-right {
  content: counter(page) " / " counter(pages);
}

打印（例如）

1  8

Answer 1

您可以用 UTF-8 编码您的 HTML（不要忘记使用 <meta> 标签告诉 browsers/iText 它）并使用您的 non-ASCII 字符直接。

根据上述建议改编的源文件示例：

<html>
<head>
  <meta charset="UTF-8">
  <style>
    @page {
      @bottom-right {
        content: "Página " counter(page) " / " counter(pages);
      }
    }
  </style>
</head>
<body>
<p>Minha terra tem palmeiras<br/>Onde canta o sabiá<br/>As aves que aqui gorjeiam<br/>Não gorjeiam como lá</p>
</body>
</html>

斜杠也可以正常工作，至少在我最新的 pdfHTML 4.0.1 中是这样。这是视觉结果：

iText7 不接受@page 规则中的特殊 html 字符

iText7 doesn't accept special html characters inside @page rule

itext7