独特的单词字典删除特殊字符和数字
unique words dictionary remove special characters and numbers
我想用一本书做一个字典,不幸的是我遇到了问题
import re
with open('vechny.txt', encoding='utf-8') as fname:
text = fname.read()
lst = list(set(text.split()))
str1 = ' '.join(str(e) for e in lst)
print(str1, file=open("1000.txt", "a", encoding='utf-8'))
in_file = open("1000.txt", "r", encoding='utf-8')
lines = in_file.read().split(' ')
in_file.close()
out_file = open("file.txt", "w", encoding='utf-8')
out_file.write("\n".join(lines))
out_file.close()
此脚本运行良好,但需要删除特殊字符
, .-, ect ... 来自纯文本
示例中有单词 Hay,拆分将其视为一个单词,因此不会将其删除
如何制作文本
input
Hay, hello,% lost. 15 čas řad
output im search is
hay hello lost cas rad
试试这个:
import re
re.sub('[^A-Za-z0-9]+', ' ', 'Hay, hello,% lost. 15')
让我知道是否可以!
这个呢?
import re
str1 = '#@-/abcüšščřžý'
r = re.findall(r'\b\d*[^\W\d_][^\W_]*\b', str1, re.UNICODE)
str2 = ''.join(r)
print(str2)
from unidecode import unidecode
import random
import re
random = (random.randint(1000, 2000))
n = (input("jmenosouboru:"))
with open(""+str(n)+".txt", encoding='utf-8') as fname:
text = fname.read()
r = re.findall(r'\b\d*[^\W\d_][^\W_]*\b', text, re.UNICODE)
str2 = ' '.join(r)
uni=(unidecode(str2))
lst = list(set(uni.split()))
str1 = ' '.join(str(e) for e in lst)
lines = str1.split(' ')
text1 = ("\n".join(lines))
text2 = ''.join(filter(lambda x: not x.isdigit(), text1))
print(text2, file=open(""+str(random)+"-.txt", "a", encoding='utf-8'))
print("done")
我想用一本书做一个字典,不幸的是我遇到了问题
import re
with open('vechny.txt', encoding='utf-8') as fname:
text = fname.read()
lst = list(set(text.split()))
str1 = ' '.join(str(e) for e in lst)
print(str1, file=open("1000.txt", "a", encoding='utf-8'))
in_file = open("1000.txt", "r", encoding='utf-8')
lines = in_file.read().split(' ')
in_file.close()
out_file = open("file.txt", "w", encoding='utf-8')
out_file.write("\n".join(lines))
out_file.close()
此脚本运行良好,但需要删除特殊字符
, .-, ect ... 来自纯文本
示例中有单词 Hay,拆分将其视为一个单词,因此不会将其删除
如何制作文本
input
Hay, hello,% lost. 15 čas řad
output im search is
hay hello lost cas rad
试试这个:
import re
re.sub('[^A-Za-z0-9]+', ' ', 'Hay, hello,% lost. 15')
让我知道是否可以!
这个呢?
import re
str1 = '#@-/abcüšščřžý'
r = re.findall(r'\b\d*[^\W\d_][^\W_]*\b', str1, re.UNICODE)
str2 = ''.join(r)
print(str2)
from unidecode import unidecode
import random
import re
random = (random.randint(1000, 2000))
n = (input("jmenosouboru:"))
with open(""+str(n)+".txt", encoding='utf-8') as fname:
text = fname.read()
r = re.findall(r'\b\d*[^\W\d_][^\W_]*\b', text, re.UNICODE)
str2 = ' '.join(r)
uni=(unidecode(str2))
lst = list(set(uni.split()))
str1 = ' '.join(str(e) for e in lst)
lines = str1.split(' ')
text1 = ("\n".join(lines))
text2 = ''.join(filter(lambda x: not x.isdigit(), text1))
print(text2, file=open(""+str(random)+"-.txt", "a", encoding='utf-8'))
print("done")