打开一个文本文件并删除多余的行和字符
open a textfile and remove redundant lines and characters
我有一个包含单词、字符、数字和空行的文本文件,如下所示:
This is an example /
The Date is: 07 Feb 2022
2.03 4.0 5.0 2*6 3*4
9e-2 7.0 2 6 2*3 5.0 /
我只想保留那些带有数字的行,并删除文件中任何位置的 /
。我还想更改 *
旁边的那些数字,这样:
2*6
变为 6 6
3*4
变为 4 4 4
2*3
变为 3 3
终于有了这样一个文件
2.03 4.0 5.0 6.0 6.0 4.0 4.0 4.0
9e-2 7.0 2.0 6.0 3.0 3.0 5.0
我已经写了一个代码来做这些,但是 运行 时间很长,我认为这不是一个好方法。
我最初打开文本文件并删除了 /
并将该文件另存为 target1
import os
import time
start = time.time()
with open('Ptest1.txt, 'r') as infile1, open('target1.txt', 'w') as outfile1:
A = infile1.read()
B = A.replace("/", "")
outfile1.write(B)
然后,我再次打开它,这次我使用这段代码删除了包含字母和空行的行,并将其保存为 target2
:
keep_these = []
def is_valid(t):
try:
float(t.replace('*', '0'))
return True
except ValueError:
pass
return False
with open('target1.txt', encoding='utf-8') as infile2, open('target2.txt', 'w') as outfile2:
for line in infile2:
if all(is_valid(t) for t in line.strip().split()):
keep_these.append(line)
if line.strip():
outfile2.write(line)
os.remove('target1.txt')
最后,我使用下面的代码来扩展 *
两侧的数字
T = open('target2.txt', 'r')
def expand(a):
import numpy as np
b = a.readlines()
c = [x.replace('\n','') for x in b]
d = [j for i in c for j in i.split()]
e=[]
for i in d:
if "*" in i:
a=[i.split("*")[1]]*int(i.split("*")[0])
e.extend(a)
else:
e.append(i)
f = np.array(e)
g = [float(numeric_string) for numeric_string in f]
h = np.array(g)
return h
MM = expand(T)
end = time.time()
print(f"Runtime of the program is {end - start}")
这是我尝试过的:
import os
import time
import string
start = time.time()
def expand(chunk):
l = chunk.split("*")
chunk = [str(float(l[1]))] * int(l[0])
return chunk
with open('/path/to/Test.txt', 'r') as infile1, open('target1.txt', 'w') as outfile1:
for line in infile1:
if set(string.ascii_letters.replace("e","")) & set(line):
continue
chunks = line.split(" ")
#Get rid of newlines
chunks = list(map(lambda chunk: chunk.strip(), chunks))
if "/" in chunks:
chunks.remove("/")
new_chunks = []
for i in range(len(chunks)):
if '*' in chunks[i]:
new_chunks += expand(chunks[i])
else:
new_chunks.append(chunks[i])
new_chunks[len(new_chunks)-1] = new_chunks[len(new_chunks)-1]+"\n"
new_line = " ".join(new_chunks)
outfile1.write(new_line)
end = time.time()
print(f"Runtime of the program is {end - start}")
我使用评论中提供的文本文件为你的程序和我的程序计时,我得到:
Runtime of the program is 1.3493189811706543
我的和
Runtime of the program is 4.36532998085022
给你的。检查差异,与您的代码相比,我的代码产生的大约 71k 行中大约多了四行,所以我认为结果不会偏离太多。
我注意到的一个问题是MM的结果最后没有写入文件。无论如何,让我知道上面的代码是否可以加快处理速度。
我有一个包含单词、字符、数字和空行的文本文件,如下所示:
This is an example /
The Date is: 07 Feb 2022
2.03 4.0 5.0 2*6 3*4
9e-2 7.0 2 6 2*3 5.0 /
我只想保留那些带有数字的行,并删除文件中任何位置的 /
。我还想更改 *
旁边的那些数字,这样:
2*6
变为 6 6
3*4
变为 4 4 4
2*3
变为 3 3
终于有了这样一个文件
2.03 4.0 5.0 6.0 6.0 4.0 4.0 4.0
9e-2 7.0 2.0 6.0 3.0 3.0 5.0
我已经写了一个代码来做这些,但是 运行 时间很长,我认为这不是一个好方法。
我最初打开文本文件并删除了 /
并将该文件另存为 target1
import os
import time
start = time.time()
with open('Ptest1.txt, 'r') as infile1, open('target1.txt', 'w') as outfile1:
A = infile1.read()
B = A.replace("/", "")
outfile1.write(B)
然后,我再次打开它,这次我使用这段代码删除了包含字母和空行的行,并将其保存为 target2
:
keep_these = []
def is_valid(t):
try:
float(t.replace('*', '0'))
return True
except ValueError:
pass
return False
with open('target1.txt', encoding='utf-8') as infile2, open('target2.txt', 'w') as outfile2:
for line in infile2:
if all(is_valid(t) for t in line.strip().split()):
keep_these.append(line)
if line.strip():
outfile2.write(line)
os.remove('target1.txt')
最后,我使用下面的代码来扩展 *
T = open('target2.txt', 'r')
def expand(a):
import numpy as np
b = a.readlines()
c = [x.replace('\n','') for x in b]
d = [j for i in c for j in i.split()]
e=[]
for i in d:
if "*" in i:
a=[i.split("*")[1]]*int(i.split("*")[0])
e.extend(a)
else:
e.append(i)
f = np.array(e)
g = [float(numeric_string) for numeric_string in f]
h = np.array(g)
return h
MM = expand(T)
end = time.time()
print(f"Runtime of the program is {end - start}")
这是我尝试过的:
import os
import time
import string
start = time.time()
def expand(chunk):
l = chunk.split("*")
chunk = [str(float(l[1]))] * int(l[0])
return chunk
with open('/path/to/Test.txt', 'r') as infile1, open('target1.txt', 'w') as outfile1:
for line in infile1:
if set(string.ascii_letters.replace("e","")) & set(line):
continue
chunks = line.split(" ")
#Get rid of newlines
chunks = list(map(lambda chunk: chunk.strip(), chunks))
if "/" in chunks:
chunks.remove("/")
new_chunks = []
for i in range(len(chunks)):
if '*' in chunks[i]:
new_chunks += expand(chunks[i])
else:
new_chunks.append(chunks[i])
new_chunks[len(new_chunks)-1] = new_chunks[len(new_chunks)-1]+"\n"
new_line = " ".join(new_chunks)
outfile1.write(new_line)
end = time.time()
print(f"Runtime of the program is {end - start}")
我使用评论中提供的文本文件为你的程序和我的程序计时,我得到:
Runtime of the program is 1.3493189811706543
我的和
Runtime of the program is 4.36532998085022
给你的。检查差异,与您的代码相比,我的代码产生的大约 71k 行中大约多了四行,所以我认为结果不会偏离太多。
我注意到的一个问题是MM的结果最后没有写入文件。无论如何,让我知道上面的代码是否可以加快处理速度。