Qt5 C++ UTF-8 转换为 Windows-1250 的罗马尼亚 ş 和 ş 字符

Question

我的应用程序是用 C++'11 开发的，并使用 Qt5。在此应用程序中，我需要将 UTF-8 文本存储为 Windows-1250 编码文件。我尝试了以下两种方法，两种方法都适用于罗马尼亚语“ş”和“ş”字符:(

1.

    auto data = QStringList() << ... <some texts here>;
    QTextStream outStream(&destFile);
    outStream.setCodec(QTextCodec::codecForName("Windows-1250"));
    foreach (auto qstr, data)
    {
        outStream << qstr << EOL_CODE;
    }

2.

    auto data = QStringList() << ... <some texts here>;
    auto *codec = QTextCodec::codecForName("Windows-1250");
    foreach (auto qstr, data)
    {
        const QByteArray encodedString = codec->fromUnicode(qstr);
        destFile.write(encodedString);
    }

如果是 'Ş' 字符（别名 0xC89B），而不是预期的 0xFE 值，该字符被编码并存储为 0x3F，这是意外的。

所以我正在寻找有关文本重新编码的任何帮助或经验/示例。

此致，

Answer 1

不要混淆 ț 和 ţ。前者是你的 post 中的内容，后者是 Windows-1250.

实际支持的内容

您 post 中的字符 Ş 是 T 逗号，U+021B，LATIN SMALL LETTER T WITH COMMA BELOW，但是：

This letter was not part of the early Unicode versions, which is why Ţ (T-cedilla, available from version 1.1.0, June 1993) is often used in digital texts in Romanian.

引用的字符是ţ，U+0163，LATIN SMALL LETTER T WITH CEDILLA（强调我的）：

In early versions of Unicode, the Romanian letter Ț (T-comma) was considered a glyph variant of Ţ, and therefore was not present in the Unicode Standard. It is also not present in the Windows-1250 (Central Europe) code page.

ş 和 ș 的故事，S-cedilla and S-comma 是类似的。

如果您必须编码到这个陈旧的 Windows 1250 代码页，我建议在编码之前用 cedilla 变体（小写和大写）替换逗号变体。我想罗马尼亚人会理解:)

Qt5 C++ UTF-8 转换为 Windows-1250 的罗马尼亚 ş 和 ş 字符

Qt5 C++ UTF-8 convertion to Windows-1250 of Romanian ș and ț characters

utf-8

cp1250

c++11

qt5