Base64 解码 Typescript 中嵌入的 PDF

Question

在 XML 文件中，我们有一个代表 PDF 文件的 base64 编码字符串，其中包含一些 table 表示，即类似于 this example. When decoding the base64 string of that PDF document (i.e. such as this)，我们最终得到一个 PDF大小为66kB的文档，可以在任何PDF查看器中正确打开。

尝试在 TypeScript 中使用 Buffer 解码相同的 base64 编码字符串（在 VSCode 扩展中），即使用以下函数：

function decodeBase64(base64String: string): string {
    const buf: Buffer = Buffer.from(base64String, "base64");
    return buf.toString();
}

// the base64 encoded string is usually extracted from an XML file directly
// for testing purposes we load that base64 encoded string from a local file
const base64Enc: string = fs.readFileSync(".../base64Enc.txt", "ascii");
const base64Decoded: string = decodeBase64(base64Enc);

fs.writeFileSync(".../table.pdf", base64Decoded);

我们最终得到一个 109 kB 大小的 PDF 和一个无法使用 PDF 查看器打开的文档。

对于简单的 PDF，例如 this one, with a base64 encoded string representation like this，上面的代码有效，PDF 可以在任何 PDF 查看器中阅读。

我还尝试使用

直接读取本地存储的 PDF 文件的 base64 编码表示

const buffer: string | Buffer = fs.readFileSync(".../base64Enc.txt", "base64");

虽然也没有产生有用的东西。

即使对 this suggestion, due to atob(...) not being present (with suggestions 稍作调整以用 Buffer 替换 atob），最终代码如下：

const buffer: string = fs.readFileSync(".../base64Enc.txt", "ascii");

// atob(...) is not present, other answers suggest to use Buffer for conversion
const binary: string = Buffer.from(buffer, 'base64').toString();
const arrayBuffer: ArrayBuffer = new ArrayBuffer(binary.length);
const uintArray: Uint8Array = new Uint8Array(arrayBuffer);

for (let i: number = 0; i < binary.length; i++) {
    uintArray[i] = binary.charCodeAt(i);
}
const decoded: string = Buffer.from(uintArray.buffer).toString();

fs.writeFileSync(".../table.pdf", decoded);

我不会以可读的 PDF 结尾。 “解码”table.pdf 样本最终大小为 109 kB。

我在这里做错了什么？我怎样才能像table.pdf示例一样解码PDF以获得可读的PDF文档，类似于Notepad++提供的功能？

Answer 1

大量借鉴的答案，如果您使用 Uint8Array 构造函数直接从 Buffer 获得 Uint8Array：

const buffer: string = fs.readFileSync(".../base64Enc.txt", "ascii");
const uintArray: Uint8Array = new Uint8Array(Buffer.from(buffer, 'base64'));
fs.writeFileSync(".../table.pdf", uintArray);

将 Uint8Array 直接写入文件可确保不会因在字符串之间移动和从字符串中更改编码而导致损坏。

^{Just a note: the Uint8Array points to the same internal array of bytes as the Buffer. Not that it matters in this case, since this code doesn't reference the Buffer outside of the constructor, but in case someone decides to create a new variable for the output of Buffer.from(buffer, 'base64').}

Base64 解码 Typescript 中嵌入的 PDF

Base64 Decode embedded PDF in Typescript

javascript

base64

decoding

node.js

typescript