如何在 Node 中读取使用 ASCII(ISO-8859-1)扩展名编码的文件?
How to read files encoded with some extension of ASCII (ISO-8859-1) using in Node?
我在使用 fs 读取文件之前使用 chardet 来检测文件的编码。直到今天,我的应用程序读取的所有文件都是 UTF-8
或 UTF-16LE
。这些很容易从 chardet 映射到节点 BufferEncoding。为此,我在上面的代码中使用了 chardetToFsEncodings
。
const chartdetToFsEncodings = new Map<string, BufferEncoding>([
["UTF-8", "utf8"],
["UTF-16LE", "utf16le"],
]);
const plausableEncodings = analyse(buffer).map((match) => match.name);
const supportedEncoding = plausableEncodings.find((name) => chartdetToFsEncodings.get(name)) as BufferEncoding;
if (supportedEncoding) {
resolve({
path,
data: buffer.toString(supportedEncoding),
});
} else {
reject(new Error("File encoding not recognized"));
}
但是当 chardet 遇到在 BufferEncodings 中没有明显模拟的编码时,有什么好的方法?就像今天我遇到了 iso-8859-2
.
But what is a good approach when chardet encounters an encoding that does not have an obvious analog in BufferEncodings?
你需要在这些情况下编写解码器:
var iso88592 = require('iso-8859-2') // https://www.npmjs.com/package/iso-8859-2
const chartdetToFsEncodings = new Map([
['UTF-8', 'utf8'],
['UTF-16LE', 'utf16le'],
['iso-8859-2', function decodeIso88592 (buffer) {
return iso88592.decode(buffer.toString('binary'))
}]
])
const plausableEncodings = analyse(buffer).map((match) => match.name)
const supportedEncoding = plausableEncodings.find((name) => chartdetToFsEncodings.get(name))
if (supportedEncoding) {
let data
if (typeof supportedEncoding === 'function') {
data = supportedEncoding(buffer)
} else {
data = buffer.toString(supportedEncoding)
}
resolve({ path, data })
} else {
reject(new Error('File encoding not recognized'))
}
通常,API 接受 UTF8,因为它管理所有字符,而 latin2
是它的一个子集。
我在使用 fs 读取文件之前使用 chardet 来检测文件的编码。直到今天,我的应用程序读取的所有文件都是 UTF-8
或 UTF-16LE
。这些很容易从 chardet 映射到节点 BufferEncoding。为此,我在上面的代码中使用了 chardetToFsEncodings
。
const chartdetToFsEncodings = new Map<string, BufferEncoding>([
["UTF-8", "utf8"],
["UTF-16LE", "utf16le"],
]);
const plausableEncodings = analyse(buffer).map((match) => match.name);
const supportedEncoding = plausableEncodings.find((name) => chartdetToFsEncodings.get(name)) as BufferEncoding;
if (supportedEncoding) {
resolve({
path,
data: buffer.toString(supportedEncoding),
});
} else {
reject(new Error("File encoding not recognized"));
}
但是当 chardet 遇到在 BufferEncodings 中没有明显模拟的编码时,有什么好的方法?就像今天我遇到了 iso-8859-2
.
But what is a good approach when chardet encounters an encoding that does not have an obvious analog in BufferEncodings?
你需要在这些情况下编写解码器:
var iso88592 = require('iso-8859-2') // https://www.npmjs.com/package/iso-8859-2
const chartdetToFsEncodings = new Map([
['UTF-8', 'utf8'],
['UTF-16LE', 'utf16le'],
['iso-8859-2', function decodeIso88592 (buffer) {
return iso88592.decode(buffer.toString('binary'))
}]
])
const plausableEncodings = analyse(buffer).map((match) => match.name)
const supportedEncoding = plausableEncodings.find((name) => chartdetToFsEncodings.get(name))
if (supportedEncoding) {
let data
if (typeof supportedEncoding === 'function') {
data = supportedEncoding(buffer)
} else {
data = buffer.toString(supportedEncoding)
}
resolve({ path, data })
} else {
reject(new Error('File encoding not recognized'))
}
通常,API 接受 UTF8,因为它管理所有字符,而 latin2
是它的一个子集。