根据给定任务编写收益生成器函数

Question

这是代码的一部分，应该运行以 1000 条记录为一组进行记录搜索：

  for subrange, batch in batched(records, size=1000):
      print("Processing records %d-%d" %
        (subrange[0], subrange[-1]))
      process(batch)

我需要为它写一个 yield 生成器函数，到目前为止，我试过这样：

def batched(records, chunk_size=1000):
    """Lazy function (generator) to read records piece by piece.
    Default chunk size: 1k."""
    while True:
        data = records.read(chunk_size)
        if not data:
            break
        yield data

问题陈述如下：

For optimal performance, records should be processed in batches.
Create a generator function "batched" that will yield batches of 1000
records at a time

我也不太确定如何测试该功能，所以，有什么想法吗？

PS = batched 生成器函数应该在给定的 for subrange 循环之前。

Answer 1

def batched(records, chunk_size=1000):
    """Lazy function (generator) to read records piece by piece.
    Default chunk size: 1k."""
    pos = 0
    while True:
        data = records.read(chunk_size)
        if not data:
            break
        yield ([pos, pos + len(data)], data )
        pos += len(data)

Answer 2

您给定的循环代码

for subrange, batch in batched(records, size=1000):
    print("Processing records %d-%d" %
      (subrange[0], subrange[-1]))
    process(batch)

对batched()有隐含要求：

它应该 return 一个可迭代的。这确实是由生成器函数实现的。
产生的项目应该是元组subrange, batch。子范围似乎是所有元素的索引列表，只是开始和结束索引的列表或元组，或者可能是 range() 对象。我会假设后者。

唉，我们对给出的 records 对象一无所知。如果它有一个 read() 功能，你的方法可以调整：

def batched(records, size=1000):
    """Generator function to read records piece by piece.
    Default chunk size: 1k."""
    index = 0
    while True:
        data = records.read(size)
        if not data:
            break
        yield range(index, index + len(data)), data
        index += len(data)

但是如果 records 只是一个应该被分解的列表，你可以这样做

def batched(records, size=1000):
    """Generator function to read records piece by piece.
    Default chunk size: 1k."""
    index = 0
    while True:
        data = records[index:index + size]
        if not data:
            break
        yield range(index, index + len(data)), data
        index += len(data)

根据给定任务编写收益生成器函数

Writing a yield generator function based on a given task

python

yield