希伯来语词典删除所有非希伯来语字符
Hebrew dictionary remove all non - hebrew characters
我想创建希伯来语词典
在删除所有非希伯来字符的内容之后添加 patram
from unidecode import unidecode
import random
import re
random = (random.randint(1000, 2000))
n = (input("HebrewFileName?:"))
with open(""+str(n)+".txt", encoding='utf-8') as fname:
text = fname.read()
res = re.sub('[!,*)@|#%(&$_?.^]', '', text)
lst = list(set(res.split()))
str1 = ' '.join(str(e) for e in lst)
lines = str1.split(' ')
lines1 = list(filter(lambda w: not re.match(r'[a-zA-Z]+', w), lines))
text1 = ("\n".join(lines1))
text2 = ''.join(filter(lambda x: not x.isdigit(), text1))
print(text2, file=open(""+str(random)+"-.txt", "a", encoding='utf-8'))
print("done")
我该怎么做?请在代码中举个例子
例如这个
test = "כַּחֲצִי" 如果是希伯来文写入文件
如果不是所有的希伯来字符都不加
例子
输入文本
test = "כַּחֲצִי"
输出是一样的 כַּחֲצִי
如果有非希伯来语单词删除
test = "כַּtestחֲצִי"
这个删除输出是 "" none
alphabet = { "א","אִ","ב","בּ","ג","ד","ה","ם","ו","וּ","ן","ז","ח","חָ","ט","י","כ","ָך","ל","מ","נ","ס","ע","פ","ף","צ","ץ","ק","ר","ש","ת"} I'm basically looking for something that removes all the characters except these and leaves spaces
alphabet = " אאִבבּגדהםווּןזחחָטיכָךלמנסעפףצץקרשת"
def letters_only(source):
result = ""
for i in source.lower():
if i in alphabet:
result += i
return result
with open(""+str("random")+".txt", encoding='utf-8') as fname:
text = fname.read()
test=(letters_only(text))
print(test, file=open(""+str("random")+"-.txt", "a", encoding='utf-8'))
我想创建希伯来语词典
在删除所有非希伯来字符的内容之后添加 patram
from unidecode import unidecode
import random
import re
random = (random.randint(1000, 2000))
n = (input("HebrewFileName?:"))
with open(""+str(n)+".txt", encoding='utf-8') as fname:
text = fname.read()
res = re.sub('[!,*)@|#%(&$_?.^]', '', text)
lst = list(set(res.split()))
str1 = ' '.join(str(e) for e in lst)
lines = str1.split(' ')
lines1 = list(filter(lambda w: not re.match(r'[a-zA-Z]+', w), lines))
text1 = ("\n".join(lines1))
text2 = ''.join(filter(lambda x: not x.isdigit(), text1))
print(text2, file=open(""+str(random)+"-.txt", "a", encoding='utf-8'))
print("done")
我该怎么做?请在代码中举个例子
例如这个
test = "כַּחֲצִי" 如果是希伯来文写入文件
如果不是所有的希伯来字符都不加
例子 输入文本 test = "כַּחֲצִי" 输出是一样的 כַּחֲצִי
如果有非希伯来语单词删除 test = "כַּtestחֲצִי" 这个删除输出是 "" none
alphabet = { "א","אִ","ב","בּ","ג","ד","ה","ם","ו","וּ","ן","ז","ח","חָ","ט","י","כ","ָך","ל","מ","נ","ס","ע","פ","ף","צ","ץ","ק","ר","ש","ת"} I'm basically looking for something that removes all the characters except these and leaves spaces
alphabet = " אאִבבּגדהםווּןזחחָטיכָךלמנסעפףצץקרשת"
def letters_only(source):
result = ""
for i in source.lower():
if i in alphabet:
result += i
return result
with open(""+str("random")+".txt", encoding='utf-8') as fname:
text = fname.read()
test=(letters_only(text))
print(test, file=open(""+str("random")+"-.txt", "a", encoding='utf-8'))