使用 iText 库从 *.xhtml 转换为 *.pdf 时无法设置希伯来字母的 RTL 方向
Can't set RTL direction for Hebrew letters while converting from *.xhtml to *.pdf by using iText library
我正在尝试使用 iText 库将带有希伯来字符 (UTF-8) 的 *.xhtml 转换为 PDF,但我以相反的顺序获取所有字母。
据我所知,question 我只能为 ColumnText
和 PdfCell
对象设置 RTL:
Arabic (and Hebrew) can only be rendered correctly in the context of
ColumnText and PdfPCell.
所以我怀疑是否可以将整个 *.xhtml 页面转换为 PDF?
这是我尝试导入的 *.xhtml 文件:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Title of document</title>
</head>
<body style="font-size:12.0pt; font-family:Arial">
שלום עולם
</body>
</html>
这是我使用的 Java 代码:
public static void convert() throws Exception{
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("import.pdf"));
writer.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
document.open();
String str = null;
BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream("import.xhtml"), "UTF8"));
StringBuilder sb = new StringBuilder();
while ((str = in.readLine()) != null) {
System.out.println(str);
sb.append(str);
}
in.close();
XMLWorkerHelper worker = XMLWorkerHelper.getInstance();
InputStream is = new ByteArrayInputStream(sb.toString().getBytes(StandardCharsets.UTF_8));
worker.parseXHtml(writer, document, is, Charset.forName("UTF-8"));
document.close();
}
}
这是我到现在为止得到的:
感谢您的帮助。
请看一下ParseHtml10 example. In this example, we have take the file hebrew.html:
<html>
<head>
<title>Hebrew text</title>
</head>
<body style="font-size:12.0pt; font-family:Arial">
<div dir="rtl" style="font-family: Noto Sans Hebrew">שלום עולם</div>
</body>
</html>
然后我们使用以下代码将其转换为 PDF:
public void createPdf(String file) throws IOException, DocumentException {
// step 1
Document document = new Document();
// step 2
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
// step 3
document.open();
// step 4
// Styles
CSSResolver cssResolver = new StyleAttrCSSResolver();
XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
fontProvider.register("resources/fonts/NotoSansHebrew-Regular.ttf");
CssAppliers cssAppliers = new CssAppliersImpl(fontProvider);
HtmlPipelineContext htmlContext = new HtmlPipelineContext(cssAppliers);
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
// Pipelines
PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);
// XML Worker
XMLWorker worker = new XMLWorker(css, true);
XMLParser p = new XMLParser(worker);
p.parse(new FileInputStream(HTML), Charset.forName("UTF-8"));;
// step 5
document.close();
}
结果看起来像 hebrew.pdf:
您需要克服哪些障碍?
- 您需要将文本包裹在
<div>
或 <td>
等元素中。
- 您需要添加属性
dir="rtl"
来定义方向。
- 您需要确保您使用的字体能够显示希伯来语。我为希伯来语使用了 NOTO 字体。这是 Google 在他们的程序中分发的字体之一,为每种可能的语言提供字体。
我看不懂希伯来语,但我希望生成的 PDF 是正确的,这样可以解决您的问题。
重要提示:此解决方案至少需要 iText 和 XML Worker 5.5.5,因为 dir
属性的支持已在 [=18] 中引入=].
我正在尝试使用 iText 库将带有希伯来字符 (UTF-8) 的 *.xhtml 转换为 PDF,但我以相反的顺序获取所有字母。
据我所知,question 我只能为 ColumnText
和 PdfCell
对象设置 RTL:
Arabic (and Hebrew) can only be rendered correctly in the context of ColumnText and PdfPCell.
所以我怀疑是否可以将整个 *.xhtml 页面转换为 PDF?
这是我尝试导入的 *.xhtml 文件:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Title of document</title>
</head>
<body style="font-size:12.0pt; font-family:Arial">
שלום עולם
</body>
</html>
这是我使用的 Java 代码:
public static void convert() throws Exception{
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("import.pdf"));
writer.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
document.open();
String str = null;
BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream("import.xhtml"), "UTF8"));
StringBuilder sb = new StringBuilder();
while ((str = in.readLine()) != null) {
System.out.println(str);
sb.append(str);
}
in.close();
XMLWorkerHelper worker = XMLWorkerHelper.getInstance();
InputStream is = new ByteArrayInputStream(sb.toString().getBytes(StandardCharsets.UTF_8));
worker.parseXHtml(writer, document, is, Charset.forName("UTF-8"));
document.close();
}
}
这是我到现在为止得到的:
感谢您的帮助。
请看一下ParseHtml10 example. In this example, we have take the file hebrew.html:
<html>
<head>
<title>Hebrew text</title>
</head>
<body style="font-size:12.0pt; font-family:Arial">
<div dir="rtl" style="font-family: Noto Sans Hebrew">שלום עולם</div>
</body>
</html>
然后我们使用以下代码将其转换为 PDF:
public void createPdf(String file) throws IOException, DocumentException {
// step 1
Document document = new Document();
// step 2
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
// step 3
document.open();
// step 4
// Styles
CSSResolver cssResolver = new StyleAttrCSSResolver();
XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
fontProvider.register("resources/fonts/NotoSansHebrew-Regular.ttf");
CssAppliers cssAppliers = new CssAppliersImpl(fontProvider);
HtmlPipelineContext htmlContext = new HtmlPipelineContext(cssAppliers);
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
// Pipelines
PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);
// XML Worker
XMLWorker worker = new XMLWorker(css, true);
XMLParser p = new XMLParser(worker);
p.parse(new FileInputStream(HTML), Charset.forName("UTF-8"));;
// step 5
document.close();
}
结果看起来像 hebrew.pdf:
您需要克服哪些障碍?
- 您需要将文本包裹在
<div>
或<td>
等元素中。 - 您需要添加属性
dir="rtl"
来定义方向。 - 您需要确保您使用的字体能够显示希伯来语。我为希伯来语使用了 NOTO 字体。这是 Google 在他们的程序中分发的字体之一,为每种可能的语言提供字体。
我看不懂希伯来语,但我希望生成的 PDF 是正确的,这样可以解决您的问题。
重要提示:此解决方案至少需要 iText 和 XML Worker 5.5.5,因为 dir
属性的支持已在 [=18] 中引入=].