Python 3 - 长 .txt 文件的 doctest 模块

Question

我有一个关于 Python 3.

中的 doctest 模块的快速问题

我还没有使用过它，只是阅读了一些资料以及它们如何将其应用于功能。我有两个功能需要测试，我想我明白我必须做什么。但是，我不知道如何在我的案例中应用它。我必须使用与 .txt 文件一起使用的函数。第一个需要一个词和文件作为输入和输出的文本文件的路径单词出现的行和对应的行号。

def find_all_instances(word, path):
    l = []
    with open(path, 'r') as file:
        for position, line in enumerate(file.readlines()):
            if word in line:
                tup = (line, position+1)
                l.append(tup)
        return l
print(find_all_instances('word', 'filename.txt'))

第二个函数将文本文件的文件路径作为输入并输出一个成对列表，每对由一个词和该词出现的次数组成在文中按降序排列。

from collections import Counter
import re

def task_2(inp):
    with open(inp, encoding="utf-8") as f:
        data = (x.lower() for x in re.split(r'[\n, .?!:;-]', f.read()) if x.isalpha())
    cnt = Counter(data)
    return cnt.most_common()

task_2(r"filepath")

我现在的问题是：如何在这些情况下应用它？由于我看到的 doctest 示例仅使用简单的函数，例如将两个输入相乘。但是，在我的例子中，输出似乎相当大，因为文本文件长约 10'000 行，因此输出同样大。如何实现这些功能？

Answer 1

我建议您创建实际执行生成列表并记录它们的函数：

from collections import Counter
import re

def find_all_instances(word, lines):
    """Returns a list of tuples (line, line_number) for the lines where the word appear.

    >>> find_all_instances('test', ['first line', 'second line for test', 'third test line', 'last line'])
    [('second line for test', 2), ('third test line', 3)]

    """
    l = []
    for position, line in enumerate(lines):
        if word in line:
            tup = (line, position+1)
            l.append(tup)
    return l

def word_counter(text):
    """Returns a list of tuples (word, word_counter) for each word in a text, sorted by the most commons.

    >>> word_counter('Lorem ipsum dolor sit amet, consectetur adipiscing elit.\nSed non risus.\n Suspendisse lectus tortor, dignissim sit amet, adipiscing nec, ultricies sed, dolor.')
    [('sid', 2), ('adipiscing', 2), ('amet', 2), ('dolor', 2), ('non', 1), ('ipsum', 1), ('ultricies', 1), ('consectetur', 1), ('risus', 1), ('elit', 1), ('nec', 1), ('tortor', 1), ('lorem', 1), ('lectus', 1), ('sed', 1), ('dignissim', 1)]

    """
    data = (x.lower() for x in re.split(r'[\n, .?!:;-]', text) if x.isalpha())
    cnt = Counter(data)
    return cnt.most_common()

然后在处理文件打开的其他函数中使用它们：

def find_all_instances_from_path(word, path):
    with open(path, 'r') as file:
        return find_all_instances(word, file.readlines())

def task_2(inp):
    with open(inp, encoding="utf-8") as f:
        return word_counter(f.read())

Python 3 - 长 .txt 文件的 doctest 模块

Python 3 - doctest module for long .txt files

doctest

function

python-3.x