将包含土耳其语字符的字符串转换为小写
Converting String which contains Turkish characters to lowercase
我想将包含土耳其语字符的字符串转换为小写,并将土耳其语字符映射为英语等价物,即 "İĞŞÇ"
-> "igsc"
.
当我使用 toLowerCase(new Locale("en", "US"))
函数时,它会将例如 İ
转换为 i
但带有虚线。
我该如何解决这个问题? (我正在使用 Java 7)
谢谢。
你可以
1) 首先,删除重音符号:
以下内容来自该话题:
Is there a way to get rid of accents and convert a whole string to regular letters? :
Use java.text.Normalizer to handle this for you.
string = Normalizer.normalize(string, Normalizer.Form.NFD);
This will separate all of the accent marks from the characters. Then,
you just need to compare each character against being a letter and
throw out the ones that aren't.
string = string.replaceAll("[^\p{ASCII}]", "");
If your text is in unicode, you should use this instead:
string = string.replaceAll("\p{M}", "");
For unicode, \P{M} matches the base glyph and \p{M} (lowercase)
matches each accent.
2) 然后,把剩下的String
改成小写就可以了
string = string.toLowerCase();
String testString = "İĞŞÇ";
System.out.println(testString);
Locale trlocale = new Locale("tr-TR");
testString = testString .toLowerCase(trlocale);
System.out.println(testString);
很有魅力:)
我想将包含土耳其语字符的字符串转换为小写,并将土耳其语字符映射为英语等价物,即 "İĞŞÇ"
-> "igsc"
.
当我使用 toLowerCase(new Locale("en", "US"))
函数时,它会将例如 İ
转换为 i
但带有虚线。
我该如何解决这个问题? (我正在使用 Java 7)
谢谢。
你可以
1) 首先,删除重音符号:
以下内容来自该话题:
Is there a way to get rid of accents and convert a whole string to regular letters? :
Use java.text.Normalizer to handle this for you.
string = Normalizer.normalize(string, Normalizer.Form.NFD);
This will separate all of the accent marks from the characters. Then, you just need to compare each character against being a letter and throw out the ones that aren't.
string = string.replaceAll("[^\p{ASCII}]", "");
If your text is in unicode, you should use this instead:
string = string.replaceAll("\p{M}", "");
For unicode, \P{M} matches the base glyph and \p{M} (lowercase) matches each accent.
2) 然后,把剩下的String
改成小写就可以了
string = string.toLowerCase();
String testString = "İĞŞÇ";
System.out.println(testString);
Locale trlocale = new Locale("tr-TR");
testString = testString .toLowerCase(trlocale);
System.out.println(testString);
很有魅力:)