根据给定任务编写收益生成器函数

Writing a yield generator function based on a given task

这是代码的一部分,应该运行 以 1000 条记录为一组进行记录搜索:

  for subrange, batch in batched(records, size=1000):
      print("Processing records %d-%d" %
        (subrange[0], subrange[-1]))
      process(batch)

我需要为它写一个 yield 生成器函数,到目前为止,我试过这样:

def batched(records, chunk_size=1000):
    """Lazy function (generator) to read records piece by piece.
    Default chunk size: 1k."""
    while True:
        data = records.read(chunk_size)
        if not data:
            break
        yield data

问题陈述如下:

For optimal performance, records should be processed in batches.
Create a generator function "batched" that will yield batches of 1000
records at a time 

我也不太确定如何测试该功能,所以,有什么想法吗?

PS = batched 生成器函数应该在给定的 for subrange 循环之前。

def batched(records, chunk_size=1000):
    """Lazy function (generator) to read records piece by piece.
    Default chunk size: 1k."""
    pos = 0
    while True:
        data = records.read(chunk_size)
        if not data:
            break
        yield ([pos, pos + len(data)], data )
        pos += len(data)

您给定的循环代码

for subrange, batch in batched(records, size=1000):
    print("Processing records %d-%d" %
      (subrange[0], subrange[-1]))
    process(batch)

batched()有隐含要求:

  1. 它应该 return 一个可迭代的。这确实是由生成器函数实现的。
  2. 产生的项目应该是元组subrange, batch。子范围似乎是所有元素的索引列表,只是开始和结束索引的列表或元组,或者可能是 range() 对象。我会假设后者。

唉,我们对给出的 records 对象一无所知。如果它有一个 read() 功能,你的方法可以调整:

def batched(records, size=1000):
    """Generator function to read records piece by piece.
    Default chunk size: 1k."""
    index = 0
    while True:
        data = records.read(size)
        if not data:
            break
        yield range(index, index + len(data)), data
        index += len(data)

但是如果 records 只是一个应该被分解的列表,你可以这样做

def batched(records, size=1000):
    """Generator function to read records piece by piece.
    Default chunk size: 1k."""
    index = 0
    while True:
        data = records[index:index + size]
        if not data:
            break
        yield range(index, index + len(data)), data
        index += len(data)