使用 ITextRenderer 将 html 转换为 PDF 的阿拉伯语问题
Arabic problems with converting html to PDF using ITextRenderer
当我使用 ITextRenderer 将 html 转换为 PDF.this 是我的代码
ByteArrayOutputStream out = new ByteArrayOutputStream();
ITextRenderer renderer = new ITextRenderer();
String inputFile = "C://Users//Administrator//Desktop//aaa2.html";
String url = new File(inputFile).toURI().toURL().toString();
renderer.setDocument(url);
renderer.getSharedContext().setReplacedElementFactory(
new B64ImgReplacedElementFactory());
// 解决阿拉伯语问题
ITextFontResolver fontResolver = renderer.getFontResolver();
try {
fontResolver.addFont("C://Users//Administrator//Desktop//arialuni.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
} catch (DocumentException e) {
e.printStackTrace();
}
renderer.layout();
OutputStream outputStream = new FileOutputStream("C://Users//Administrator//Desktop//HTMLasPDF.pdf");
renderer.createPDF(outputStream, true);
/*PdfWriter writer = renderer.getWriter();
writer.open();
writer.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
OutputStream outputStream2 = new FileOutputStream( "C://Users//Administrator//Desktop//HTMLasPDFcopy.txt");
renderer.createPDF(outputStream2);*/
renderer.finishPDF();
out.flush();
out.close();
实际 PDF 结果:
预期 PDF 结果:
如何制作阿拉伯文连字?
希腊字符似乎被省略了;他们没有出现在文档中。
In flying saucer the generated PDF uses some kind of default
(probably Helvetica) font, that contains a very limited character set,
that obviously does not contain the Greek code page. link
如果你想正确地做到这一点(我假设使用 iText,因为你的 post 被标记为这样),你应该使用
- iText7
- pdfHTML(将 HTML 转换为 PDF)
- pdfCalligraph(正确处理阿拉伯语连字)
- 支持这些功能的字体(如另一个答案所示)
示例请参考HTML to PDF tutorial, more specifically the following FAQ item: How to convert HTML containing Arabic/Hebrew characters to PDF?
您需要包含所需字形的字体,例如:
public static final String[] FONTS = {
"src/main/resources/fonts/noto/NotoSans-Regular.ttf",
"src/main/resources/fonts/noto/NotoNaskhArabic-Regular.ttf",
"src/main/resources/fonts/noto/NotoSansHebrew-Regular.ttf"
};
并且您需要 FontProvider
知道如何在 ConverterProperties
:
中找到这些字体
public void createPdf(String src, String[] fonts, String dest) throws IOException {
ConverterProperties properties = new ConverterProperties();
FontProvider fontProvider = new DefaultFontProvider(false, false, false);
for (String font : fonts) {
FontProgram fontProgram = FontProgramFactory.createFont(font);
fontProvider.addFont(fontProgram);
}
properties.setFontProvider(fontProvider);
HtmlConverter.convertToPdf(new File(src), new File(dest), properties);
}
请注意,如果您想了解有关连字的更多信息,如果您没有pdfCalligraph add-on. That add-on didn't exist at the time Flying Saucer was created, hence you can't use Flying Saucer for converting documents with text in Arabic, Hindi, Telugu,... Read the pdFCalligraph white paper,文本将完全错误。
我改用wkhtmltopdf转pdf的方式
当我使用 ITextRenderer 将 html 转换为 PDF.this 是我的代码
ByteArrayOutputStream out = new ByteArrayOutputStream();
ITextRenderer renderer = new ITextRenderer();
String inputFile = "C://Users//Administrator//Desktop//aaa2.html";
String url = new File(inputFile).toURI().toURL().toString();
renderer.setDocument(url);
renderer.getSharedContext().setReplacedElementFactory(
new B64ImgReplacedElementFactory());
// 解决阿拉伯语问题
ITextFontResolver fontResolver = renderer.getFontResolver();
try {
fontResolver.addFont("C://Users//Administrator//Desktop//arialuni.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
} catch (DocumentException e) {
e.printStackTrace();
}
renderer.layout();
OutputStream outputStream = new FileOutputStream("C://Users//Administrator//Desktop//HTMLasPDF.pdf");
renderer.createPDF(outputStream, true);
/*PdfWriter writer = renderer.getWriter();
writer.open();
writer.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
OutputStream outputStream2 = new FileOutputStream( "C://Users//Administrator//Desktop//HTMLasPDFcopy.txt");
renderer.createPDF(outputStream2);*/
renderer.finishPDF();
out.flush();
out.close();
实际 PDF 结果:
预期 PDF 结果:
如何制作阿拉伯文连字?
希腊字符似乎被省略了;他们没有出现在文档中。
In flying saucer the generated PDF uses some kind of default (probably Helvetica) font, that contains a very limited character set, that obviously does not contain the Greek code page. link
如果你想正确地做到这一点(我假设使用 iText,因为你的 post 被标记为这样),你应该使用
- iText7
- pdfHTML(将 HTML 转换为 PDF)
- pdfCalligraph(正确处理阿拉伯语连字)
- 支持这些功能的字体(如另一个答案所示)
示例请参考HTML to PDF tutorial, more specifically the following FAQ item: How to convert HTML containing Arabic/Hebrew characters to PDF?
您需要包含所需字形的字体,例如:
public static final String[] FONTS = {
"src/main/resources/fonts/noto/NotoSans-Regular.ttf",
"src/main/resources/fonts/noto/NotoNaskhArabic-Regular.ttf",
"src/main/resources/fonts/noto/NotoSansHebrew-Regular.ttf"
};
并且您需要 FontProvider
知道如何在 ConverterProperties
:
public void createPdf(String src, String[] fonts, String dest) throws IOException {
ConverterProperties properties = new ConverterProperties();
FontProvider fontProvider = new DefaultFontProvider(false, false, false);
for (String font : fonts) {
FontProgram fontProgram = FontProgramFactory.createFont(font);
fontProvider.addFont(fontProgram);
}
properties.setFontProvider(fontProvider);
HtmlConverter.convertToPdf(new File(src), new File(dest), properties);
}
请注意,如果您想了解有关连字的更多信息,如果您没有pdfCalligraph add-on. That add-on didn't exist at the time Flying Saucer was created, hence you can't use Flying Saucer for converting documents with text in Arabic, Hindi, Telugu,... Read the pdFCalligraph white paper,文本将完全错误。
我改用wkhtmltopdf转pdf的方式