Python 3 - 长 .txt 文件的 doctest 模块
Python 3 - doctest module for long .txt files
我有一个关于 Python 3.
中的 doctest 模块的快速问题
我还没有使用过它,只是阅读了一些资料以及它们如何将其应用于功能。我有两个功能需要测试,我想我明白我必须做什么。但是,我不知道如何在我的案例中应用它。我必须使用与 .txt 文件一起使用的函数。第一个需要一个词和文件
作为输入和输出的文本文件的路径
单词出现的行和对应的行号。
def find_all_instances(word, path):
l = []
with open(path, 'r') as file:
for position, line in enumerate(file.readlines()):
if word in line:
tup = (line, position+1)
l.append(tup)
return l
print(find_all_instances('word', 'filename.txt'))
第二个函数将文本文件的文件路径作为输入并输出一个
成对列表,每对由一个词和该词出现的次数组成
在文中按降序排列。
from collections import Counter
import re
def task_2(inp):
with open(inp, encoding="utf-8") as f:
data = (x.lower() for x in re.split(r'[\n, .?!:;-]', f.read()) if x.isalpha())
cnt = Counter(data)
return cnt.most_common()
task_2(r"filepath")
我现在的问题是:如何在这些情况下应用它?由于我看到的 doctest 示例仅使用简单的函数,例如将两个输入相乘。但是,在我的例子中,输出似乎相当大,因为文本文件长约 10'000 行,因此输出同样大。如何实现这些功能?
我建议您创建实际执行生成列表并记录它们的函数:
from collections import Counter
import re
def find_all_instances(word, lines):
"""Returns a list of tuples (line, line_number) for the lines where the word appear.
>>> find_all_instances('test', ['first line', 'second line for test', 'third test line', 'last line'])
[('second line for test', 2), ('third test line', 3)]
"""
l = []
for position, line in enumerate(lines):
if word in line:
tup = (line, position+1)
l.append(tup)
return l
def word_counter(text):
"""Returns a list of tuples (word, word_counter) for each word in a text, sorted by the most commons.
>>> word_counter('Lorem ipsum dolor sit amet, consectetur adipiscing elit.\nSed non risus.\n Suspendisse lectus tortor, dignissim sit amet, adipiscing nec, ultricies sed, dolor.')
[('sid', 2), ('adipiscing', 2), ('amet', 2), ('dolor', 2), ('non', 1), ('ipsum', 1), ('ultricies', 1), ('consectetur', 1), ('risus', 1), ('elit', 1), ('nec', 1), ('tortor', 1), ('lorem', 1), ('lectus', 1), ('sed', 1), ('dignissim', 1)]
"""
data = (x.lower() for x in re.split(r'[\n, .?!:;-]', text) if x.isalpha())
cnt = Counter(data)
return cnt.most_common()
然后在处理文件打开的其他函数中使用它们:
def find_all_instances_from_path(word, path):
with open(path, 'r') as file:
return find_all_instances(word, file.readlines())
def task_2(inp):
with open(inp, encoding="utf-8") as f:
return word_counter(f.read())
我有一个关于 Python 3.
中的 doctest 模块的快速问题我还没有使用过它,只是阅读了一些资料以及它们如何将其应用于功能。我有两个功能需要测试,我想我明白我必须做什么。但是,我不知道如何在我的案例中应用它。我必须使用与 .txt 文件一起使用的函数。第一个需要一个词和文件 作为输入和输出的文本文件的路径 单词出现的行和对应的行号。
def find_all_instances(word, path):
l = []
with open(path, 'r') as file:
for position, line in enumerate(file.readlines()):
if word in line:
tup = (line, position+1)
l.append(tup)
return l
print(find_all_instances('word', 'filename.txt'))
第二个函数将文本文件的文件路径作为输入并输出一个 成对列表,每对由一个词和该词出现的次数组成 在文中按降序排列。
from collections import Counter
import re
def task_2(inp):
with open(inp, encoding="utf-8") as f:
data = (x.lower() for x in re.split(r'[\n, .?!:;-]', f.read()) if x.isalpha())
cnt = Counter(data)
return cnt.most_common()
task_2(r"filepath")
我现在的问题是:如何在这些情况下应用它?由于我看到的 doctest 示例仅使用简单的函数,例如将两个输入相乘。但是,在我的例子中,输出似乎相当大,因为文本文件长约 10'000 行,因此输出同样大。如何实现这些功能?
我建议您创建实际执行生成列表并记录它们的函数:
from collections import Counter
import re
def find_all_instances(word, lines):
"""Returns a list of tuples (line, line_number) for the lines where the word appear.
>>> find_all_instances('test', ['first line', 'second line for test', 'third test line', 'last line'])
[('second line for test', 2), ('third test line', 3)]
"""
l = []
for position, line in enumerate(lines):
if word in line:
tup = (line, position+1)
l.append(tup)
return l
def word_counter(text):
"""Returns a list of tuples (word, word_counter) for each word in a text, sorted by the most commons.
>>> word_counter('Lorem ipsum dolor sit amet, consectetur adipiscing elit.\nSed non risus.\n Suspendisse lectus tortor, dignissim sit amet, adipiscing nec, ultricies sed, dolor.')
[('sid', 2), ('adipiscing', 2), ('amet', 2), ('dolor', 2), ('non', 1), ('ipsum', 1), ('ultricies', 1), ('consectetur', 1), ('risus', 1), ('elit', 1), ('nec', 1), ('tortor', 1), ('lorem', 1), ('lectus', 1), ('sed', 1), ('dignissim', 1)]
"""
data = (x.lower() for x in re.split(r'[\n, .?!:;-]', text) if x.isalpha())
cnt = Counter(data)
return cnt.most_common()
然后在处理文件打开的其他函数中使用它们:
def find_all_instances_from_path(word, path):
with open(path, 'r') as file:
return find_all_instances(word, file.readlines())
def task_2(inp):
with open(inp, encoding="utf-8") as f:
return word_counter(f.read())