AWS Lambda 中的 PDFBox 缓存

PDFBox caching in AWS Lambda

我正在使用 Apache pdfbox 处理 pdf 文件。

运行 它在本地工作正常,我在 AWS Lambda 上执行代码时遇到错误。 这是有道理的,因为 pdfbox 会尝试更新字体缓存,这在 Lambda 上是不可能的。

我收到以下错误信息:

Feb 20, 2017 3:22:19 PM org.apache.pdfbox.pdmodel.font.FileSystemFontProvider loadDiskCache
WARNING: New fonts found, font cache will be re-built
Feb 20, 2017 3:22:19 PM org.apache.pdfbox.pdmodel.font.FileSystemFontProvider <init>
WARNING: Building on-disk font cache, this may take a while
Feb 20, 2017 3:22:20 PM org.apache.pdfbox.pdmodel.font.FileSystemFontProvider saveDiskCache
SEVERE: Could not write to font cache

java.io.FileNotFoundException: /home/sbx_user1063/.pdfbox.cache (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at java.io.FileOutputStream.<init>(FileOutputStream.java:162)
at java.io.FileWriter.<init>(FileWriter.java:90)
at org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.saveDiskCache(FileSystemFontProvider.java:290)
at org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.<init>(FileSystemFontProvider.java:226)
at org.apache.pdfbox.pdmodel.font.FontMapperImpl$DefaultFontProvider.<clinit>(FontMapperImpl.java:130)
at org.apache.pdfbox.pdmodel.font.FontMapperImpl.getProvider(FontMapperImpl.java:149)
at org.apache.pdfbox.pdmodel.font.FontMapperImpl.findFont(FontMapperImpl.java:413)
at org.apache.pdfbox.pdmodel.font.FontMapperImpl.findFontBoxFont(FontMapperImpl.java:376)
at org.apache.pdfbox.pdmodel.font.FontMapperImpl.getFontBoxFont(FontMapperImpl.java:350)
at org.apache.pdfbox.pdmodel.font.PDType1Font.<init>(PDType1Font.java:145)
at org.apache.pdfbox.pdmodel.font.PDType1Font.<clinit>(PDType1Font.java:79)
at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:62)
at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:143)
at org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:60)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:829)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:486)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:460)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150)
at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:139)
at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:391)
at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319)
at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
at org.apache.pdfbox.text.PDFTextStripper.getText(PDFTextStripper.java:227)
at de.scdm.panther.ParsePdf.handleRequest(ParsePdf.java:59)
at de.scdm.panther.ParsePdf.handleRequest(ParsePdf.java:22)
at lambdainternal.EventHandlerLoader$PojoHandlerAsStreamHandler.handleRequest(EventHandlerLoader.java:375)
at lambdainternal.EventHandlerLoader.call(EventHandlerLoader.java:1139)
at lambdainternal.AWSLambda.call(AWSLambda.java:94)
at lambdainternal.AWSLambda.startRuntime(AWSLambda.java:285)
at lambdainternal.AWSLambda.<clinit>(AWSLambda.java:57)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at lambdainternal.LambdaRTEntry.main(LambdaRTEntry.java:94)

Feb 20, 2017 3:22:20 PM org.apache.pdfbox.pdmodel.font.FileSystemFontProvider <init>
WARNING: Finished building on-disk font cache, found 52 fonts

如何停用字体缓存更新? 有人遇到过类似的问题吗?

谢谢!

您无法停用它,但消息是无害的,即它不会停止您的工作,但下一个 运行 不会更快,因为您的字体将被再次查看。

您可以将 属性 "pdfbox.fontcache" 设置为您可以写入的目录,例如/tmp,这 should exist 在 AWS Lambda 上。

PDFBox 首先查看 "pdfbox.fontcache" 属性,如果未设置,它将查看 "user.home" 属性(这是您的系统),如果那个没有设置它会查看 "java.io.tmpdir" 属性 到 select 一个目录来写入字体缓存。