在每次迭代中只使用文件的特定部分

Question

我正在为 Python（特别是 3.x）使用外部 API，以根据 .txt 文件中的某些关键字获取搜索结果。但是，由于每个时间间隔内我可以搜索多少关键字的限制（假设我需要每小时等待）我运行脚本，我只能使用一部分关键字（比如 50 个关键字） . Python我如何才能在每次迭代中只使用一部分关键字？

假设我在 .txt 文件 myWords.txt 中有以下关键字列表：

Lorem #0
ipsum #1
dolor #2
sit   #3
amet  #4
...
vitae #167

我想在第一次迭代中使用在 0-49（即前 50 行）中找到的关键字，第二次使用 50-99，第三次使用 100-149，第四次使用 150-167和最后一次迭代。

当然，这可以通过读取整个文件，读取保存在别处的迭代计数器，然后选择驻留在完整列表的可迭代部分中的关键字范围来实现。但是，在我想做的事情中，我不想有一个外部计数器，而是只有我的 Python 脚本和 myWords.txt 在 [=28] 中处理计数器的地方=] 代码本身。

我只想在当前运行脚本中使用我应该使用的关键字（取决于 (total number of keywords)/50）。同时，如果我要在 myWords.txt 末尾添加任何新关键字，它应该相应地调整迭代，如果需要，添加新迭代。

Answer 1

试试这个。根据您的需要进行修改。

$ cat foo
1
2
3
4
5
6
7
8
9
10

cat getlines.py
import sys


def getlines(filename, limit):
    with open(filename, 'r') as handle:
        keys = []
        for idx, line in enumerate(handle):
            if idx % limit == 0 and idx != 0:
                yield keys
                keys = []
            keys.append(line.strip())

print(list(getlines('foo', 2)))
print(list(getlines('foo', 3)))
print(list(getlines('foo', 4)))

Answer 2

据我所知，无法保留在脚本的不同调用之间使用的关键字。但是，在如何实现脚本的不同调用中所需的信息 "persistent storage" 方面，您确实有多种选择。

您可以有两个文件，而不是只有一个名为 myWords.txt 的输入文件。一个文件包含您要搜索的关键字，另一个文件包含您已经搜索过的关键字。当您搜索关键字时，您将它们从一个文件中删除并放入另一个文件中。
您可以实施 persistent storage 存储单词的策略。
（最简单的事情也是我会做的）只是有一个名为 next_index.txt 的文件并存储迭代中的最后一个索引。

下面是我将要执行的操作的实现：

创建下一个位置文件

echo 0 > next_pos.txt

现在做你的工作

with open('next_pos.txt') as fh:
    next_pos = int(fh.read().strip())

rows_to_search = 2 # This would be 50 in your case
keywords = list()
with open('myWords.txt') as fh:
    fh.seek(next_pos)
    for _ in range(rows_to_search):
       keyword = fh.readline().strip()
       keywords.append(keyword)
       next_pos = fh.tell()

# Store cursor location in file.
with open('next_pos.txt', 'w') as fh:
    fh.write(str(next_pos))

# Make your API call
# Rinse, Wash, Repeat

正如我所说的，您有很多选择，我不知道是否有任何一种方式比其他方式更 Pythonic，但无论您做什么，都尽量保持简单。

在每次迭代中只使用文件的特定部分

Use only a certain portion of file in every iteration

python

file-processing

python-3.x