如何转换 Java 中的重音字符

How to convert accented characters in Java

我正在使用 Java 1.5,我需要规范化一个字符串 (像这样 àèìòù ---> aeiou)。我不能使用 Normalizer 因为是 > 1.6 有什么想法吗?

我试过这个:

public String normalizeText(String text) {
    text = normalizer(text);
    text = text.replaceAll("\p{InCombiningDiacriticalMarks}]", "");
    return text;
}

public static String normalizer(String word) {
    try {
        int i;
        Class<?> normalizerClass = Class.forName("java.text.Normalizer");
        Class<?> normalizerFormClass = null;
        Class<?>[] nestedClasses = normalizerClass.getDeclaredClasses();
        for (i = 0; i < nestedClasses.length; i++) {
            Class<?> nestedClass = nestedClasses[i];
            if (nestedClass.getName().equals("java.text.Normalizer$Form")) {
                normalizerFormClass = nestedClass;
            }
        }
        assert normalizerFormClass.isEnum();
        Method methodNormalize = normalizerClass.getDeclaredMethod(
                "normalize",
                CharSequence.class,
                normalizerFormClass);
        Object nfcNormalization = null;
        Object[] constants = normalizerFormClass.getEnumConstants();
        for (i = 0; i < constants.length; i++) {
            Object constant = constants[i];
            if (constant.toString().equals("NFC")) {
                nfcNormalization = constant;
            }
        }
        return (String) methodNormalize.invoke(null, word, nfcNormalization);
    } catch (Exception ex) { return null; }
}

制作自己的方法

如果您不能使用 Normaliser,还有一个很好的方法是使用 Map,您可以将所有可能的字母变体归一化。

HashMap<Character, Character> rep = new HashMap<>();
rep.put("à","a");
rep.put("è","e");
rep.put("ì","i");
rep.put("ò","o");
rep.put("ù","u");
// etc...

这又长又糟,所以从文本文件加载更好。


已有答案

此时page I have found the following answer。有效,我已经测试过了:

从 00c0 到 017f 的 unicode table 的镜像,没有变音符号。

private static final String tab00c0 = "AAAAAAACEEEEIIII" +
    "DNOOOOO\u00d7\u00d8UUUUYI\u00df" +
    "aaaaaaaceeeeiiii" +
    "\u00f0nooooo\u00f7\u00f8uuuuy\u00fey" +
    "AaAaAaCcCcCcCcDd" +
    "DdEeEeEeEeEeGgGg" +
    "GgGgHhHhIiIiIiIi" +
    "IiJjJjKkkLlLlLlL" +
    "lLlNnNnNnnNnOoOo" +
    "OoOoRrRrRrSsSsSs" +
    "SsTtTtTtUuUuUuUu" +
    "UuUuWwYyYZzZzZzF";

Returns 不带变音符号的字符串 - 7 位近似值。

public static String removeDiacritic(String source) {
    char[] vysl = new char[source.length()];
    char one;
    for (int i = 0; i < source.length(); i++) {
        one = source.charAt(i);
        if (one >= '\u00c0' && one <= '\u017f') {
            one = tab00c0.charAt((int) one - '\u00c0');
        }
        vysl[i] = one;
    }
    return new String(vysl);
}