希伯来语词典删除所有非希伯来语字符

Question

我想创建希伯来语词典

在删除所有非希伯来字符的内容之后添加 patram

from unidecode import unidecode
import random
import re

random = (random.randint(1000, 2000))

n = (input("HebrewFileName?:"))

with open(""+str(n)+".txt", encoding='utf-8') as fname:
    text = fname.read()
    res = re.sub('[!,*)@|#%(&$_?.^]', '', text)
    lst = list(set(res.split()))
    str1 = ' '.join(str(e) for e in lst)
    lines = str1.split(' ')
    lines1 = list(filter(lambda w: not re.match(r'[a-zA-Z]+', w), lines))
    text1 = ("\n".join(lines1))
    text2 = ''.join(filter(lambda x: not x.isdigit(), text1))

    print(text2, file=open(""+str(random)+"-.txt", "a", encoding='utf-8'))
    print("done")

我该怎么做？请在代码中举个例子

例如这个

test = "כַּחֲצִי" 如果是希伯来文写入文件

如果不是所有的希伯来字符都不加

例子输入文本 test = "כַּחֲצִי" 输出是一样的 כַּחֲצִי

如果有非希伯来语单词删除 test = "כַּtestחֲצִי" 这个删除输出是 "" none

 alphabet = {   "א","אִ","ב","בּ","ג","ד","ה","ם","ו","וּ","ן","ז","ח","חָ","ט","י","כ","ָך","ל","מ","נ","ס","ע","פ","ף","צ","ץ","ק","ר","ש","ת"} I'm basically looking for something that removes all the characters except these and leaves spaces

Answer 1

alphabet = " אאִבבּגדהםווּןזחחָטיכָךלמנסעפףצץקרשת" 
def letters_only(source):
    result = ""
    for i in source.lower():
        if i in alphabet:
            result += i
    return result


with open(""+str("random")+".txt", encoding='utf-8') as fname:
    text = fname.read()
    test=(letters_only(text))
    print(test, file=open(""+str("random")+"-.txt", "a", encoding='utf-8'))

希伯来语词典删除所有非希伯来语字符

Hebrew dictionary remove all non - hebrew characters

python

dictionary