使用 addJS 添加汉字注释时 PyPDF2 解码问题

Question

我想使用 PyPDF2 通过 addJS 以编程方式添加注释，它适用于拉丁字符但不适用于中文字符，尝试使用 UTF-8 编码但似乎也不起作用。这是代码：

from PyPDF2 import PdfFileWriter, PdfFileReader
Def Test():
    inputPDF = PdfFileReader('./demo/TESTPDFANNOTATION.pdf', "rb")    
    outputPDF = PdfFileWriter()
       
    pages = inputPDF.getNumPages()
    for p in range(pages):
        outputPDF.addPage(inputPDF.getPage(p))

    outputStream = open('./demo/TESTPDFANNOTATIONOUT.pdf', "wb")
    outputPDF.addJS("var annot = this.addAnnot({ \r \
                    page: 0, \r \
                    type: 'FreeText', \r \
                    contents: '你好', \r \
                    textFont: 'csongl', \r \
                    textSize: 10, \r \
                    rect: [200, 300, 200+150, 300+3*12], // height for three lines \r \
                    width: 1, \r \
                    alignment: 1 \r \
                    });")
    outputPDF.write(outputStream)    
    outputStream.close()
    return("ok")

奇怪的是，如果我在记事本文本编辑器中打开 PDF，中文字符显示正确，但是当用 PDF 打开时，它显示类似 ä½€å¥½ 的内容，似乎没有解码，因为它们可以被解码在线转换工具转换成几乎正确的汉字，在某些情况下不完全相同。 https://cafewebmaster.com/online_tools/utf_decode

任何建议将不胜感激！

Python 版本：3.9+ OS: Win10

谢谢赤柱

Answer 1

最后，想出使用另一个包 PyMuPDF 以编程方式添加注释，并且对汉字有很好的支持。

import fitz

def writeAnnotation():
    blue  = (0,0,1)
    gold  = (1,1,0)

    pdfDoc = fitz.open('./demo/TESTPDFANNOTATION.pdf')
    page = pdfDoc[0]

    rect1 = fitz.Rect(100,100,200,150)

    strContent1= "你好！世界"

    a1 = page.addFreetextAnnot(rect1, strContent1, text_color=blue,  fill_color=gold)

    pdfDoc.save("./demo/TESTPDFANNOTATIONOUT.pdf")
    return("Well done!")

使用 addJS 添加汉字注释时 PyPDF2 解码问题

PyPDF2 decoding issue when adding annotations in Chinese characters with addJS

javascript

pdf

annotations

decoding

pypdf2