在RTF中指定utf-8字符编码?文本(UTF-8)格式在 Sqlite 中正确显示

Specify utf-8 character encoding in RTF? The text (in UTF-8) format is correctly shown in Sqlite

如何设置 UTF-8 字符编码格式字符的 RTF 字符编码?

我研究过类似的问题,但没有找到很好的解决方案。所以,希望能帮到你。

内容在 Sqlite 数据库中。 Slqite 数据库中的文本只能使用 UTF-8、UTF-16 或类似格式。所以这就是我必须坚持使用 UTF-8 的原因。

使用 Sqlite 数据库浏览器可以正确显示 e"。

所需的目标程序,只能读取RTF,显示字符的方式很奇怪。

我试过例如:

{\rtf1\ansi\ansicpg0\uc0...
{\rtf1\ansi\ansicpg1252\uc0...
{\rtf1\ansi\ansicpg65001\uc0...

一个选项是将特殊字符映射到它们的 RTF-char 等价物,如 this table 所示。

您提到的网站链接到 Unicode in RTF

If the character is between 255 and 32,768, express it as \uc1\unumber*. For example, , character number 21,487, is \uc1\u21487* in RTF.

If the character is between 32,768 and 65,535, subtract 65,536 from it, and use the resulting negative number. For example, is character 36,947, so we subtract 65,536 to get -28,589 and we have \uc1\u-28589* in RTF.

If the character is over 65,535, then we can’t express it in RTF

看起来 RTF 根本不知道 UTF-8,一般只知道 Unicode。其他答案for Java and C# just直接使用\u

我在很多地方看到 RTF 没有 UTF-8 标准解决方案。

所以,我在扫描了一半的互联网后创建了自己的转换器。如果您有 standard/better 解决方案,请告诉我!

所以学习后this book and I created a converter based on these character mappings。很棒的资源。

这解决了我的问题。重复使用其他解决方案是我想为这种功能做的,但是我找不到一个,唉。

转换器可能是这样的:

public static String convertHtmlToRtf(String html) {
    String tmp = html.replaceAll("\R", " ")
            .replaceAll("\\", "\\\\")
            .replaceAll("\{", "\\{")
            .replaceAll("}", "\\}");
    tmp = tmp.replaceAll("<a\s+target=\"_blank\"\s+href=[\"']([^\"']+?)[\"']\s*>([^<]+?)</a>",
            "{\\field{\\*\\fldinst HYPERLINK \"\"}{\\fldrslt \\plain \\f2\\b\\fs20\\cf2 }}");
    tmp = tmp.replaceAll("<a\s+href=[\"']([^\"']+?)[\"']\s*>([^<]+?)</a>",
            "{\\field{\\*\\fldinst HYPERLINK \"\"}{\\fldrslt \\plain \\f2\\b\\fs20\\cf2 }}");

    tmp = tmp.replaceAll("<h3>", "\\line{\\b\\fs30{");
    tmp = tmp.replaceAll("</h3>", "}}\\line\\line ");
    tmp = tmp.replaceAll("<b>", "{\\b{");
    tmp = tmp.replaceAll("</b>", "}}");
    tmp = tmp.replaceAll("<strong>", "{\\b{");
    tmp = tmp.replaceAll("</strong>", "}}");
    tmp = tmp.replaceAll("<i>", "{\\i{");
    tmp = tmp.replaceAll("</i>", "}}");
    tmp = tmp.replaceAll("&amp;", "&");
    tmp = tmp.replaceAll("&quot;", "\"");
    tmp = tmp.replaceAll("&copy;", "{\\'a9}");
    tmp = tmp.replaceAll("&lt;", "<");
    tmp = tmp.replaceAll("&gt;", ">");
    tmp = tmp.replaceAll("<br/?><br/?>", "{\\pard \\par}\\line ");
    tmp = tmp.replaceAll("<br/?>", "\\line ");
    tmp = tmp.replaceAll("<BR>", "\\line ");
    tmp = tmp.replaceAll("<p[^>]*?>", "{\\pard ");
    tmp = tmp.replaceAll("</p>", " \\par}\\line ");
    tmp = convertSpecialCharsToRtfCodes(tmp);
    return "{\rtf1\ansi\ansicpg0\uc0\deff0\deflang0\deflangfe0\fs20{\fonttbl{\f0\fnil Tahoma;}{\f1\fnil Tahoma;}{\f2\fnil\fcharset0 Tahoma;}}{\colortbl;\red0\green0\blue0;\red0\green0\blue255;\red0\green255\blue0;\red255\green0\blue0;}" + tmp + "}";
}

 private static String convertSpecialCharsToRtfCodes(String input) {
    char[] chars = input.toCharArray();
    StringBuffer sb = new StringBuffer();
    int length = chars.length;
    for (int i = 0; i < length; i++) {
        switch (chars[i]) {
            case '’':
                sb.append("{\'92}");
                break;
            case '`':
                sb.append("{\'60}");
                break;
            case '€':
                sb.append("{\'80}");
                break;
            case '…':
                sb.append("{\'85}");
                break;
            case '‘':
                sb.append("{\'91}");
                break;
            case '̕':
                sb.append("{\'92}");
                break;
            case '“':
                sb.append("{\'93}");
                break;
            case '”':
                sb.append("{\'94}");
                break;
            case '•':
                sb.append("{\'95}");
                break;
            case '–':
            case '‒':
                sb.append("{\'96}");
                break;
            case '—':
                sb.append("{\'97}");
                break;
            case '©':
                sb.append("{\'a9}");
                break;
            case '«':
                sb.append("{\'ab}");
                break;
            case '±':
                sb.append("{\'b1}");
                break;
            case '„':
                sb.append("\"");
                break;
            case '´':
                sb.append("{\'b4}");
                break;
            case '¸':
                sb.append("{\'b8}");
                break;
            case '»':
                sb.append("{\'bb}");
                break;
            case '½':
                sb.append("{\'bd}");
                break;
            case 'Ä':
                sb.append("{\'c4}");
                break;
            case 'È':
                sb.append("{\'c8}");
                break;
            case 'É':
                sb.append("{\'c9}");
                break;
            case 'Ë':
                sb.append("{\'cb}");
                break;
            case 'Ï':
                sb.append("{\'cf}");
                break;
            case 'Í':
                sb.append("{\'cd}");
                break;
            case 'Ó':
                sb.append("{\'d3}");
                break;
            case 'Ö':
                sb.append("{\'d6}");
                break;
            case 'Ü':
                sb.append("{\'dc}");
                break;
            case 'Ú':
                sb.append("{\'da}");
                break;
            case 'ß':
            case 'β':
                sb.append("{\'df}");
                break;
            case 'à':
                sb.append("{\'e0}");
                break;
            case 'á':
                sb.append("{\'e1}");
                break;
            case 'ä':
                sb.append("{\'e4}");
                break;
            case 'è':
                sb.append("{\'e8}");
                break;
            case 'é':
                sb.append("{\'e9}");
                break;
            case 'ê':
                sb.append("{\'ea}");
                break;
            case 'ë':
                sb.append("{\'eb}");
                break;
            case 'ï':
                sb.append("{\'ef}");
                break;
            case 'í':
                sb.append("{\'ed}");
                break;
            case 'ò':
                sb.append("{\'f2}");
                break;
            case 'ó':
                sb.append("{\'f3}");
                break;
            case 'ö':
                sb.append("{\'f6}");
                break;
            case 'ú':
                sb.append("{\'fa}");
                break;
            case 'ü':
                sb.append("{\'fc}");
                break;
            default:
                if( chars[i] != ' ' && isSpaceChar( chars[i])) {
                    System.out.print( ".");
                    //sb.append("{\~}");
                    sb.append(" ");
                } else if( chars[i] == 8218) {
                    System.out.println("Strange comma ... ");
                    sb.append(",");
                } else if( chars[i] > 132) {
                    System.err.println( "Special code that is not translated in RTF: '" + chars[i] + "', nummer=" + (int) chars[i]);
                    sb.append(chars[i]);
                } else {
                    sb.append(chars[i]);
                }
        }
    }
    return sb.toString();
}