打开一个文本文件并删除多余的行和字符

open a textfile and remove redundant lines and characters

我有一个包含单词、字符、数字和空行的文本文件,如下所示:

This is an example /
The Date is: 07 Feb 2022
2.03 4.0 5.0 2*6 3*4
9e-2 7.0 2 6 2*3 5.0 /

我只想保留那些带有数字的行,并删除文件中任何位置的 /。我还想更改 * 旁边的那些数字,这样:

2*6 变为 6 6

3*4 变为 4 4 4

2*3 变为 3 3

终于有了这样一个文件

2.03 4.0 5.0 6.0 6.0 4.0 4.0 4.0
9e-2 7.0 2.0 6.0 3.0 3.0 5.0

我已经写了一个代码来做这些,但是 运行 时间很长,我认为这不是一个好方法。

我最初打开文本文件并删除了 / 并将该文件另存为 target1

import os
import time
start = time.time()

with open('Ptest1.txt, 'r') as infile1, open('target1.txt', 'w') as outfile1:
    A = infile1.read()
    B = A.replace("/", "")
    outfile1.write(B)

然后,我再次打开它,这次我使用这段代码删除了包含字母和空行的行,并将其保存为 target2:

keep_these = []
def is_valid(t):
    try:
        float(t.replace('*', '0'))
        return True
    except ValueError:
        pass
    return False

with open('target1.txt', encoding='utf-8') as infile2, open('target2.txt', 'w') as outfile2:
    for line in infile2:
        if all(is_valid(t) for t in line.strip().split()):
            keep_these.append(line)
            if line.strip():
                outfile2.write(line)

os.remove('target1.txt')

最后,我使用下面的代码来扩展 *

两侧的数字
T = open('target2.txt', 'r')

def expand(a):
    import numpy as np
    b = a.readlines()
    c = [x.replace('\n','') for x in b]
    d = [j for i in c for j in i.split()]
    e=[]
    for i in d:
        if "*" in i:
            a=[i.split("*")[1]]*int(i.split("*")[0])
            e.extend(a)
        else:
            e.append(i)
    f = np.array(e)
    g = [float(numeric_string) for numeric_string in f]
    h = np.array(g)
    return h

MM = expand(T)

end = time.time()
print(f"Runtime of the program is {end - start}")

这是我尝试过的:

import os
import time
import string

start = time.time()
def expand(chunk):

    l = chunk.split("*")
    chunk = [str(float(l[1]))] * int(l[0])

    return chunk

with open('/path/to/Test.txt', 'r') as infile1, open('target1.txt', 'w') as outfile1:
    for line in infile1:
        if set(string.ascii_letters.replace("e","")) & set(line):
            continue

        chunks = line.split(" ")
        #Get rid of newlines
        chunks = list(map(lambda chunk: chunk.strip(), chunks))
        if "/" in chunks:
            chunks.remove("/")

        new_chunks = []
        for i in range(len(chunks)):
            if '*' in chunks[i]:
                new_chunks += expand(chunks[i])
            else:
                new_chunks.append(chunks[i])
        new_chunks[len(new_chunks)-1] = new_chunks[len(new_chunks)-1]+"\n"
        new_line = " ".join(new_chunks)
        outfile1.write(new_line)

end = time.time()
print(f"Runtime of the program is {end - start}")

我使用评论中提供的文本文件为你的程序和我的程序计时,我得到:

Runtime of the program is 1.3493189811706543

我的和

Runtime of the program is 4.36532998085022

给你的。检查差异,与您的代码相比,我的代码产生的大约 71k 行中大约多了四行,所以我认为结果不会偏离太多。

我注意到的一个问题是MM的结果最后没有写入文件。无论如何,让我知道上面的代码是否可以加快处理速度。