如何从 lua 中的字符串中删除 tashkeel?
How to remove tashkeel from a string in lua?
我正在做一个简单的功能,应该从阿拉伯语文本中删除 tashkeel,替换技术适用于英语,但不适用于阿拉伯语,你有什么建议?
lua代码:-
function replacePartOfString(arg,old,new)
local zzz = arg.gsub(arg, old, new)
return zzz
end
function wordLengthIgnoringTashkeel(arg)
local tashkeelArray = {"َ","ً","ُ","ٌ","ِ","ٍ","ْ","َ"}
local tempWord = arg
print("tempWord Before"..tempWord)
for x=1,#tashkeelArray do
replacePartOfString(tempWord,tashkeelArray[x],"")
end
print("tempWord After"..tempWord)
end
result
tempWord Beforeاليَوْمَ tempWord Afterاليَوْمَ
而预期的结果
expected result
tempWord Beforeاليَوْمَ tempWord Afterاليوم
这有效
function replacePartOfString(arg,old,new)
return arg.gsub(arg, old, new)
end
function wordLengthIgnoringTashkeel(arg)
local tashkeelArray = {"َ","ً","ُ","ٌ","ِ","ٍ","ْ","َّ"}
local tempWord = arg
for x=1,#tashkeelArray do
tempWord = replacePartOfString(tempWord,tashkeelArray[x],"")
end
return #tempWord
end
函数 wordLengthIgnoringTashkeel(arg)
local tashkeelArray = {"َ","ً","ُ","ٌ","ِ","ٍ","ْ","̶"}
本地 tempWord = arg
print("tempWord Before"..tempWord)
对于 x=1,#tashkeelArray 做
tempWord = string.gsub(tempWord, tashkeelArray[x],"")
结尾
打印 ( "tempWord After"..tempWord )
结束
wordLengthIgnoringTashkeel("يَوْمو")
这段代码可能对你有帮助,它对我有用,一个文件:
perl -CS -pe 's/[\x{064B}-\x{0650}]|[\x{0618}-\x{061A}]|[\x{0652}-\x{0653}]|[\x{0652}-\x{0653}]+//g' < "$f" > "$f.txt" ;
对于文件夹中的所有文件:
for f in *.txt; do
perl -CS -pe 's/[\x{064B}-\x{0650}]|[\x{0618}-\x{061A}]|[\x{0652}-\x{0653}]|[\x{0652}-\x{0653}]+//g' < "$f" > "$f.txt" ;
done
此致
我正在做一个简单的功能,应该从阿拉伯语文本中删除 tashkeel,替换技术适用于英语,但不适用于阿拉伯语,你有什么建议?
lua代码:-
function replacePartOfString(arg,old,new)
local zzz = arg.gsub(arg, old, new)
return zzz
end
function wordLengthIgnoringTashkeel(arg)
local tashkeelArray = {"َ","ً","ُ","ٌ","ِ","ٍ","ْ","َ"}
local tempWord = arg
print("tempWord Before"..tempWord)
for x=1,#tashkeelArray do
replacePartOfString(tempWord,tashkeelArray[x],"")
end
print("tempWord After"..tempWord)
end
result
tempWord Beforeاليَوْمَ tempWord Afterاليَوْمَ
而预期的结果
expected result
tempWord Beforeاليَوْمَ tempWord Afterاليوم
这有效
function replacePartOfString(arg,old,new)
return arg.gsub(arg, old, new)
end
function wordLengthIgnoringTashkeel(arg)
local tashkeelArray = {"َ","ً","ُ","ٌ","ِ","ٍ","ْ","َّ"}
local tempWord = arg
for x=1,#tashkeelArray do
tempWord = replacePartOfString(tempWord,tashkeelArray[x],"")
end
return #tempWord
end
函数 wordLengthIgnoringTashkeel(arg) local tashkeelArray = {"َ","ً","ُ","ٌ","ِ","ٍ","ْ","̶"}
本地 tempWord = arg
print("tempWord Before"..tempWord) 对于 x=1,#tashkeelArray 做 tempWord = string.gsub(tempWord, tashkeelArray[x],"") 结尾 打印 ( "tempWord After"..tempWord ) 结束
wordLengthIgnoringTashkeel("يَوْمو")
这段代码可能对你有帮助,它对我有用,一个文件:
perl -CS -pe 's/[\x{064B}-\x{0650}]|[\x{0618}-\x{061A}]|[\x{0652}-\x{0653}]|[\x{0652}-\x{0653}]+//g' < "$f" > "$f.txt" ;
对于文件夹中的所有文件:
for f in *.txt; do
perl -CS -pe 's/[\x{064B}-\x{0650}]|[\x{0618}-\x{061A}]|[\x{0652}-\x{0653}]|[\x{0652}-\x{0653}]+//g' < "$f" > "$f.txt" ;
done
此致