如何在 Node 中读取使用 ASCII(ISO-8859-1)扩展名编码的文件?

How to read files encoded with some extension of ASCII (ISO-8859-1) using in Node?

我在使用 fs 读取文件之前使用 chardet 来检测文件的编码。直到今天,我的应用程序读取的所有文件都是 UTF-8UTF-16LE。这些很容易从 chardet 映射到节点 BufferEncoding。为此,我在上面的代码中使用了 chardetToFsEncodings

const chartdetToFsEncodings = new Map<string, BufferEncoding>([
  ["UTF-8", "utf8"],
  ["UTF-16LE", "utf16le"],
]);

const plausableEncodings = analyse(buffer).map((match) => match.name);

const supportedEncoding = plausableEncodings.find((name) => chartdetToFsEncodings.get(name)) as BufferEncoding;
if (supportedEncoding) {
  resolve({
    path,
    data: buffer.toString(supportedEncoding),
  });
} else {
  reject(new Error("File encoding not recognized"));
}

但是当 chardet 遇到在 BufferEncodings 中没有明显模拟的编码时,有什么好的方法?就像今天我遇到了 iso-8859-2.

But what is a good approach when chardet encounters an encoding that does not have an obvious analog in BufferEncodings?

你需要在这些情况下编写解码器:

var iso88592 = require('iso-8859-2') // https://www.npmjs.com/package/iso-8859-2

const chartdetToFsEncodings = new Map([
  ['UTF-8', 'utf8'],
  ['UTF-16LE', 'utf16le'],
  ['iso-8859-2', function decodeIso88592 (buffer) {
    return iso88592.decode(buffer.toString('binary'))
  }]
])

const plausableEncodings = analyse(buffer).map((match) => match.name)

const supportedEncoding = plausableEncodings.find((name) => chartdetToFsEncodings.get(name))
if (supportedEncoding) {
  let data
  if (typeof supportedEncoding === 'function') {
    data = supportedEncoding(buffer)
  } else {
    data = buffer.toString(supportedEncoding)
  }

  resolve({ path, data })
} else {
  reject(new Error('File encoding not recognized'))
}

通常,API 接受 UTF8,因为它管理所有字符,而 latin2 是它的一个子集。