将包含土耳其语字符的字符串转换为小写

Converting String which contains Turkish characters to lowercase

我想将包含土耳其语字符的字符串转换为小写,并将土耳其语字符映射为英语等价物,即 "İĞŞÇ" -> "igsc".

当我使用 toLowerCase(new Locale("en", "US")) 函数时,它会将例如 İ 转换为 i 但带有虚线。

我该如何解决这个问题? (我正在使用 Java 7)

谢谢。

你可以

1) 首先,删除重音符号:

以下内容来自该话题:

Is there a way to get rid of accents and convert a whole string to regular letters? :

Use java.text.Normalizer to handle this for you.

string = Normalizer.normalize(string, Normalizer.Form.NFD);

This will separate all of the accent marks from the characters. Then, you just need to compare each character against being a letter and throw out the ones that aren't.

string = string.replaceAll("[^\p{ASCII}]", "");

If your text is in unicode, you should use this instead:

string = string.replaceAll("\p{M}", "");

For unicode, \P{M} matches the base glyph and \p{M} (lowercase) matches each accent.

2) 然后,把剩下的String改成小写就可以了

string = string.toLowerCase();
String testString = "İĞŞÇ";
System.out.println(testString);
Locale trlocale = new Locale("tr-TR");
testString = testString .toLowerCase(trlocale);
System.out.println(testString);

很有魅力:)