swift 从阿拉伯语中删除变音符号
swift remove diacritic from Arabic
我正在尝试删除阿拉伯语文本变音符号。例如,我需要将此 َب
转换为此 ب
,这是我的代码:
if (text != "") {
for char in text! {
print(char)
print(char.unicodeScalars.first?.value)
if allowed.contains("\(char)"){
newText.append(char)
}
}
self.textView.text = text!
} else {
// TODO :
// show an alert
print("uhhh no way")
}
我尝试了这些解决方案,但没有成功:
NSString : easy way to remove UTF-8 accents from a string?
您可以将 CFStringTransform
与 kCFStringTransformStripCombiningMarks
一起使用
删除 (accents or diacritics)
let original = "ََب"
let mutableString = NSMutableString(string: original) as CFMutableString
CFStringTransform(mutableString, nil, kCFStringTransformStripCombiningMarks, Bool(truncating: 0))
let normalized = (mutableString as NSMutableString).copy() as! NSString
print(normalized)
CFStringTransform
A constant containing the transformation of a string by removing
combining marks.
kCFStringTransformStripCombiningMarks
The identifier of a transform to strip combining marks (accents or diacritics).
你可以使用正则表达式,试试这个代码
let myString = "الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ"
let regex = try! NSRegularExpression(pattern: "[\u064b-\u064f\u0650-\u0652]", options: NSRegularExpression.Options.caseInsensitive)
let range = NSMakeRange(0, myString.unicodeScalars.count)
let modString = regex.stringByReplacingMatches(in: myString, options: [], range: range, withTemplate: "")
print(modString)
Output : الحمد لله رب العالمين
使用这个扩展:
extension String {
/// strip combining marks (accents or diacritics)
var stripDiacritics: String {
let mStringRef = NSMutableString(string: self) as CFMutableString
CFStringTransform(mStringRef, nil, kCFStringTransformStripCombiningMarks, false)
return mStringRef as String
}
}
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#this code for arabic preporocessing
import pyarabic.araby as araby
import pyarabic.number as number
text = u'الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ'
#Strip Harakat from arabic word except Shadda.
from pyarabic.araby import strip_harakat
print(strip_harakat(text))
# الحمد للّه ربّ العالمين
#حذف الحركات بما فيها الشدة
#Strip vowels from a text, include Shadda.
from pyarabic.araby import strip_tashkeel
print(strip_tashkeel(text))
#الحمد لله رب العالمين
我正在尝试删除阿拉伯语文本变音符号。例如,我需要将此 َب
转换为此 ب
,这是我的代码:
if (text != "") {
for char in text! {
print(char)
print(char.unicodeScalars.first?.value)
if allowed.contains("\(char)"){
newText.append(char)
}
}
self.textView.text = text!
} else {
// TODO :
// show an alert
print("uhhh no way")
}
我尝试了这些解决方案,但没有成功:
NSString : easy way to remove UTF-8 accents from a string?
您可以将 CFStringTransform
与 kCFStringTransformStripCombiningMarks
删除 (accents or diacritics)
let original = "ََب"
let mutableString = NSMutableString(string: original) as CFMutableString
CFStringTransform(mutableString, nil, kCFStringTransformStripCombiningMarks, Bool(truncating: 0))
let normalized = (mutableString as NSMutableString).copy() as! NSString
print(normalized)
CFStringTransform
A constant containing the transformation of a string by removing combining marks.
kCFStringTransformStripCombiningMarks
The identifier of a transform to strip combining marks (accents or diacritics).
你可以使用正则表达式,试试这个代码
let myString = "الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ"
let regex = try! NSRegularExpression(pattern: "[\u064b-\u064f\u0650-\u0652]", options: NSRegularExpression.Options.caseInsensitive)
let range = NSMakeRange(0, myString.unicodeScalars.count)
let modString = regex.stringByReplacingMatches(in: myString, options: [], range: range, withTemplate: "")
print(modString)
Output : الحمد لله رب العالمين
使用这个扩展:
extension String {
/// strip combining marks (accents or diacritics)
var stripDiacritics: String {
let mStringRef = NSMutableString(string: self) as CFMutableString
CFStringTransform(mStringRef, nil, kCFStringTransformStripCombiningMarks, false)
return mStringRef as String
}
}
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#this code for arabic preporocessing
import pyarabic.araby as araby
import pyarabic.number as number
text = u'الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ'
#Strip Harakat from arabic word except Shadda.
from pyarabic.araby import strip_harakat
print(strip_harakat(text))
# الحمد للّه ربّ العالمين
#حذف الحركات بما فيها الشدة
#Strip vowels from a text, include Shadda.
from pyarabic.araby import strip_tashkeel
print(strip_tashkeel(text))
#الحمد لله رب العالمين