打开目录中的每个 file/subfolder 并将结果打印到 .txt 文件

Question

目前我正在使用这段代码：

from bs4 import BeautifulSoup
import glob
import os
import re
import contextlib


@contextlib.contextmanager
def stdout2file(fname):
    import sys
    f = open(fname, 'w')
    sys.stdout = f
    yield
    sys.stdout = sys.__stdout__
    f.close()

def trade_spider():
    os.chdir(r"C:\Users30p\FLO'S DATEIEN\Master FAU\Sommersemester 2016_Masterarbeit_Testumgebung_Probedateien für Analyseaspekt\Independent Auditors Report")
    with stdout2file("output.txt"):
        for file in glob.iglob('**/*.html', recursive=True):
            with open(file, encoding="utf8") as f:
                contents = f.read()
                soup = BeautifulSoup(contents, "html.parser")
                for item in soup.findAll("ix:nonfraction"):
                    if re.match(".*AuditFeesExpenses", item['name']):
                        print(file.split(os.path.sep)[-1], end="| ")
                        print(item['name'], end="| ")
                        print(item.get_text())
trade_spider()

到目前为止，一切正常。但现在我陷入了另一个问题。如果我在一个没有子文件夹但只有文件的文件夹中搜索，这没有问题。但是，如果我尝试在具有子文件夹的文件夹上运行此代码，则它不起作用（它不打印任何内容！）。此外，我希望将我的结果打印到一个 .txt 文件中，而无需在其中包含整个路径。结果应该是这样的：

Filename.html| RegEX Match| HTML text

我确实已经得到了这个结果，但只是在 PyCharm 中，而不是在单独的 .txt 文件中。

综上所述，我有两个问题：

我怎样才能遍历我定义的目录中的子文件夹？ -> os.walk() 会是一个选项吗？
如何将结果打印到 .txt 文件中？ -> sys.stdout 会解决这个问题吗？

在此问题上提供任何帮助表示感谢！

更新：它只将第一个文件的第一个结果打印到我的 "outout.txt" 文件中（至少我认为它是第一个，因为它是我唯一的子文件夹中的最后一个文件并且 recursive=true 已激活）。知道为什么它不遍历所有其他文件吗？

UPDATE_2: 问题已解决！最终代码可以在上面看到！

Answer 1

要遍历子目录，有两种选择：

将 ** 与 glob 和参数 recursive=True (glob.glob('**/*.html')) 一起使用。这仅适用于 Python 3.5+。如果目录树很大，我还建议使用 glob.iglob 而不是 glob.glob。
使用 os.walk 并手动或使用 fnmatch.filter.

".html"

关于打印成文件，又是几种方式：

只需执行脚本并重定向 stdout，即 python3 myscript.py >myfile.txt
将对 print 的调用替换为对处于写入模式的文件对象的 .write() 方法的调用。
继续使用 print，但给它参数 file=myfile，其中 myfile 又是一个可写文件对象。

编辑： 也许最不引人注目的方法是以下方法。首先，将其包含在某处：

import contextlib
@contextlib.contextmanager
def stdout2file(fname):
    import sys
    f = open(fname, 'w')
    sys.stdout = f
    yield
    sys.stdout = sys.__stdout__
    f.close()

然后，在循环文件行的前面，添加此行（并适当缩进）：

with stdout2file("output.txt"):

打开目录中的每个 file/subfolder 并将结果打印到 .txt 文件

Open every file/subfolder in directory and print results to .txt file

text-files

subdirectory

pycharm

python-3.x