在 NodeJS 中将字符串从 utf8 转换为 latin1

Question

我使用的是 Latin1 编码的数据库，无法将其更改为 UTF-8，这意味着我运行遇到了某些应用程序数据的问题。我正在使用 Tesseract 对文档进行 OCR（tesseract 在 UTF-8 中编码）并尝试使用 iconv-lite；但是，它会创建一个缓冲区并将该缓冲区转换为字符串。但同样，缓冲区到字符串的转换不允许 "latin1" 编码。

我读了一堆questions/answers；然而，我得到的只是设置客户端编码和类似的东西。

有什么想法吗？

Answer 1

您可以从现有的 UFT8 字符串创建一个缓冲区，然后使用 iconv-lite 将该缓冲区解码为 Latin 1，就像这样

var buff   = new Buffer(tesseract_string, 'utf8');
var DB_str = iconv.decode(buff, 'ISO-8859-1');

Answer 2

我找到了一种将任何编码文本文件转换为 UTF8 的方法

var 
  fs = require('fs'),
  charsetDetector = require('node-icu-charset-detector'),
  iconvlite = require('iconv-lite');

/* Having different encodings
 * on text files in a git repo
 * but need to serve always on 
 * standard 'utf-8'
 */
function getFileContentsInUTF8(file_path) {
  var content = fs.readFileSync(file_path);
  var original_charset = charsetDetector.detectCharset(content);
  var jsString = iconvlite.decode(content, original_charset.toString());
  return jsString;
}

我在这里也有一个要点：https://gist.github.com/jacargentina/be454c13fa19003cf9f48175e82304d5

也许你可以试试这个，其中 content 应该是你的 数据库缓冲区数据 （在 latin1 编码中）

Answer 3

从 Node.js v7.1.0 开始，您可以使用 buffer 模块中的 transcode 函数：
https://nodejs.org/api/buffer.html#buffer_buffer_transcode_source_fromenc_toenc

例如：

const buffer = require('buffer');
const latin1Buffer = buffer.transcode(Buffer.from(utf8String), "utf8", "latin1");
const latin1String = latin1Buffer.toString("latin1");

在 NodeJS 中将字符串从 utf8 转换为 latin1

Converting a string from utf8 to latin1 in NodeJS

encoding

utf-8

latin1

node.js