在产生项目时检测for循环项目是否是最后一个？

Question

我正在处理一个巨大的 postgresql 数据库，我为其创建了一个 "fetch" 函数。

def fetch(cursor, batch_size=1e3):
    """An iterator that uses fetchmany to keep memory usage down"""
    while True:
        records = cursor.fetchmany(int(batch_size))
        if not records:
            break
        for record in records:
            yield record

对于每个项目我都在做一些处理，但现在我有一个问题，在某些情况下最后一个项目会被省略，因为我正在对项目进行一些比较。一旦比较在最后一项上没有产生任何结果，就不会做任何事情。

connection = psycopg2.connect(<url>)
cursor = connection.cursor()

cursor.execute(<some query>)

temp_today = 0

for row in fetch(cursor):
    item = extract_variables(row)
    date = item['datetime']
    today = date.date()
    if temp_today is 0:
        # do something with first row
        temp_today = date
    # -----------------------------------------
    # I feel like I am missing a statement here
    # something like:
    # if row == rows[-1]:
    #     do something with last row..
    # -----------------------------------------
    elif temp_today.date() == today:
        # do something with every row where 
        # the date is the same
    else:
        # do something with every row where
        # the dates ain't the same

当我使用收益时，如何处理最后一项？

使用 yield 对我来说非常重要，因为我正在处理一个非常庞大的数据集，如果我不这样做，我会运行内存不足。

Answer 1

您可以定义另一个生成器，以便迭代返回的项目和前一个（如果有）：

def pair( sequence):
    previous = None
    for item in sequence:
        yield (item, previous)
        previous = item

for item, previous_item in pair( mygenerator( args))
    if previous_item is None:
        # process item: first one returned
    else:
        # you can compare item and previous_item

Answer 2

感谢@Peter Smit 的评论，我使用了以下解决方案：

connection = psycopg2.connect(<url>)
cursor = connection.cursor()

cursor.execute(<some query>)

temp_today = 0
parsed_count = 0
cursor_count = cursor.rowcount

for row in fetch(cursor):
    item = extract_variables(row)
    date = item['datetime']
    today = date.date()
    if temp_today is 0:
        # do something with first row
        temp_today = date
    elif parsed_count == cursor_count:
        # do something with the last row
    elif temp_today.date() == today:
        # do something with every row where 
        # the date is the same
    else:
        # do something with every row where
        # the dates ain't the same

在产生项目时检测for循环项目是否是最后一个？

Detecting if for-loop item is the last when yielding items?

python

postgresql

iterator

yield

python-3.x