使用 java 将重音字符转换为英语

Convert accent characters to english using java

我有一个要求,我需要使用 IcelandJapan 用户的重音字符进行搜索。我写的代码适用于一些重音字符,但不是全部。 下面的例子 -

À - returns a. Correct.
 - returns a. Correct.
Ð - returns Ð. This is breaking. It should return e.
Õ - returns Õ. This is breaking. It should return o.

下面是我的代码:-

String accentConvertStr = StringUtils.stripAccents(myKey);

也试过这个:-

byte[] b = key.getBytes("Cp1252");
System.out.println("" + new String(b, StandardCharsets.UTF_8));

请指教

我会说它按预期工作。 StringUtils.stripAccents的底层代码其实如下

String[] chars  = new String[]{"À","Â","Ð","Õ"};

for(String c : chars){
  String normalized = Normalizer.normalize(c,Normalizer.Form.NFD);
  System.out.println(normalized.replaceAll("\p{InCombiningDiacriticalMarks}+", ""));
}

这将输出: 一种 一种 Ð O

如果你阅读答案,你会发现

Be aware that that will not remove what you might think of as “accent” marks from all characters! There are many it will not do this for. For example, you cannot convert Đ to D or ø to o that way. For that, you need to reduce code points to those that match the same primary collation strength in the Unicode Collation Table.

如果还想用的话可以单独处理StringUtil.stripAccents。

请尝试 https://github.com/xuender/unidecode 它似乎适合你的情况。

 String normalized = Unidecode.decode(input);