从 txt 文件中提取字符串并将其添加到列表中 - Python3

Question

我在一个文件夹中有多个文本文件，其中包含大量随机数据和代码。我正在尝试提取特定字符串开头和结尾之间的文本（我猜有更好的方法来转义下面的内容）。

start = '\" alt=\"\" aria-label=\"'
end = '\"'

我将使用下面的代码来处理目录中的文本文件，但我不知道如何提取字符串并将它们附加到列表中。

for filename in os.listdir(path):
    if filename.endswith(".txt"):
        fullpath = os.path.join(path, filename)

    with open("fullpath", "r") as file:
    #extract strings
    #my_list.append(extracted_strings)

Answer 1

它被称为文件处理你使用打开函数打开（文件名或路径，然后模式 r 用于读取 w 用于写入 a 用于追加）给你的代码：

with open(fullpath, 'r') as f: x.append(f.readlines)

Answer 2

试试这个。（假设您的 start 和 end 正则表达式模式是正确的）。使用 regex 库。

import re
pattern = start + ‘\s*()\s*’ + end
pattern = re.compile(pattern) # for speeding up 
re.findall(pattern, text_from_file)

使用 glob 库获取具有特定扩展名的文件列表。

from glob import glob
# get a list of target files
files = glob(“path/to/files/*.txt”)
results = list()
# keep track of files without matches
nonmatched = list()

for file in files: 
    # open and access file-content
    with open(file, ‘r’) as f:
        text_from_file = f.read()
    # search for patterns
    result = re.findall(pattern, text_from_file)
    # append to results only if non-empty 
    # search-result found
    if result:
        results.append(result)
    else:
        nonmatched.append(file)

print(f“Total {len(results)}/{len(files)} files were found with matching results. \nTotal matched cases: { sum([ len(result) for result in results ]) }”)

从 txt 文件中提取字符串并将其添加到列表中 - Python3

Extract string from txt file and add it to list - Python3

python

text-extraction

python-3.x