在 Node 中的缓冲区上调用 toString 时出现意外结果

Question

我现在需要将数据恢复到已调用 toString 的缓冲区。例如：

const buffer // I need this, or equivalent
const bufferString = buffer.toString() // This is all I have

node documentation 意味着 .toString() 默认为 'utf8' 编码，我可以用 Buffer.from(bufferString, 'utf8') 恢复它，但这不起作用，我得到不同的数据. （可能在转换为字符串时会丢失一些数据，尽管文档中似乎没有提到这一点）。

有谁知道为什么会这样或如何解决？

这是我必须重现的数据：

const intArr = [31, 139, 8, 0, 0, 0, 0, 0, 0, 0, 170, 86, 42, 201, 207, 78, 205, 83, 178, 82, 178, 76, 78, 53, 179, 72, 74, 51, 215, 53, 54, 51, 51, 211, 53, 49, 78, 50, 210, 77, 74, 49, 182, 208, 53, 52, 178, 180, 72, 75, 76, 52, 75, 180, 76, 50, 81, 170, 5, 0, 0, 0, 255, 255, 3, 0, 29, 73, 93, 151, 48, 0, 0, 0]
const buffer = Buffer.from(intArr) // The buffer I want!
const bufferString = buffer.toString() // The string I have!, note .toString() and .toString('utf8') are equivalent
const differentBuffer = Buffer.from(bufferString, 'utf8')

您可以通过执行以下操作从缓冲区中获取初始 intArr：

JSON.parse(JSON.stringify(Buffer.from(buffer)))['data']

编辑：有趣的是，在 differentBuffer 上调用 .toString() 会给出相同的初始字符串。

Answer 1

我认为您链接的文档的重要部分是 When decoding a Buffer into a string that does not exclusively contain valid UTF-8 data, the Unicode replacement character U+FFFD � will be used to represent those errors. 当您将缓冲区转换为 utf8 字符串时，并非所有字符都是有效的 utf8，您可以通过执行 console.log(bufferString); 几乎所有这些都是胡言乱语。因此，当从缓冲区转换为 utf8 字符串时，您将无法挽回地丢失数据，并且在转换回缓冲区时无法找回丢失的数据。

在您的示例中，如果您使用 utf16 而不是 utf8，您不会丢失信息，因此一旦转换回来，您的缓冲区是相同的。即.E

const intArr = [31, 139, 8, 0, 0, 0, 0, 0, 0, 0, 170, 86, 42, 201, 207, 78, 205, 83, 178, 82, 178, 76, 78, 53, 179, 72, 74, 51, 215, 53, 54, 51, 51, 211, 53, 49, 78, 50, 210, 77, 74, 49, 182, 208, 53, 52, 178, 180, 72, 75, 76, 52, 75, 180, 76, 50, 81, 170, 5, 0, 0, 0, 255, 255, 3, 0, 29, 73, 93, 151, 48, 0, 0, 0]
const buffer = Buffer.from(intArr);
const bufferString = buffer.toString('utf16le');
const differentBuffer = Buffer.from(bufferString, 'utf16le') ;
console.log(buffer); // same as the below log
console.log(differentBuffer); // same as the above log

Answer 2

将 'latin1' 或 'binary' 编码与 Buffer.toString 和 Buffer.from 一起使用。这些编码是相同的并将字节映射到 unicode 字符 U+0000 到 U+00FF.

在 Node 中的缓冲区上调用 toString 时出现意外结果

Unexpected result when calling toString on a buffer in Node

buffer

bytebuffer

zlib

node.js