swift 从阿拉伯语中删除变音符号

swift remove diacritic from Arabic

我正在尝试删除阿拉伯语文本变音符号。例如,我需要将此 َب 转换为此 ب ,这是我的代码:

if (text != "") {
    for char in text! {
        print(char)
        print(char.unicodeScalars.first?.value)
        if allowed.contains("\(char)"){
            newText.append(char)
        }
    }
    self.textView.text = text!
} else {
//            TODO :
//            show an alert
    print("uhhh no way")
}

我尝试了这些解决方案,但没有成功:

NSString : easy way to remove UTF-8 accents from a string?

您可以将 CFStringTransformkCFStringTransformStripCombiningMarks

一起使用

删除 (accents or diacritics)

        let original = "ََب"
        let mutableString = NSMutableString(string: original) as CFMutableString
        CFStringTransform(mutableString, nil, kCFStringTransformStripCombiningMarks, Bool(truncating: 0))
        let normalized = (mutableString as NSMutableString).copy() as! NSString

        print(normalized)

CFStringTransform

A constant containing the transformation of a string by removing combining marks.

kCFStringTransformStripCombiningMarks

The identifier of a transform to strip combining marks (accents or diacritics).

你可以使用正则表达式,试试这个代码

 let myString = "الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ"
        let regex = try! NSRegularExpression(pattern: "[\u064b-\u064f\u0650-\u0652]", options: NSRegularExpression.Options.caseInsensitive)
        let range = NSMakeRange(0, myString.unicodeScalars.count)
        let modString = regex.stringByReplacingMatches(in: myString, options: [], range: range, withTemplate: "")
        print(modString)

Output : الحمد لله رب العالمين

使用这个扩展:

extension String {
    /// strip combining marks (accents or diacritics)
    var stripDiacritics: String {
        let mStringRef = NSMutableString(string: self) as CFMutableString
        CFStringTransform(mStringRef, nil, kCFStringTransformStripCombiningMarks, false)
        return mStringRef as String
    }
}

#!/usr/bin/env python
# -*- coding: utf-8 -*-

#this code for arabic preporocessing
import pyarabic.araby as araby
import pyarabic.number as number

text = u'الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ'

#Strip Harakat from arabic word except Shadda.
from pyarabic.araby import strip_harakat
print(strip_harakat(text))
# الحمد للّه ربّ العالمين

#حذف الحركات بما فيها الشدة
#Strip vowels from a text, include Shadda.
from pyarabic.araby import strip_tashkeel
print(strip_tashkeel(text))
#الحمد لله رب العالمين