在产生项目时检测for循环项目是否是最后一个?
Detecting if for-loop item is the last when yielding items?
我正在处理一个巨大的 postgresql 数据库,我为其创建了一个 "fetch" 函数。
def fetch(cursor, batch_size=1e3):
"""An iterator that uses fetchmany to keep memory usage down"""
while True:
records = cursor.fetchmany(int(batch_size))
if not records:
break
for record in records:
yield record
对于每个项目我都在做一些处理,但现在我有一个问题,在某些情况下最后一个项目会被省略,因为我正在对项目进行一些比较。一旦比较在最后一项上没有产生任何结果,就不会做任何事情。
connection = psycopg2.connect(<url>)
cursor = connection.cursor()
cursor.execute(<some query>)
temp_today = 0
for row in fetch(cursor):
item = extract_variables(row)
date = item['datetime']
today = date.date()
if temp_today is 0:
# do something with first row
temp_today = date
# -----------------------------------------
# I feel like I am missing a statement here
# something like:
# if row == rows[-1]:
# do something with last row..
# -----------------------------------------
elif temp_today.date() == today:
# do something with every row where
# the date is the same
else:
# do something with every row where
# the dates ain't the same
当我使用收益时,如何处理最后一项?
使用 yield 对我来说非常重要,因为我正在处理一个非常庞大的数据集,如果我不这样做,我会 运行 内存不足。
您可以定义另一个生成器,以便迭代返回的项目和前一个(如果有):
def pair( sequence):
previous = None
for item in sequence:
yield (item, previous)
previous = item
for item, previous_item in pair( mygenerator( args))
if previous_item is None:
# process item: first one returned
else:
# you can compare item and previous_item
感谢@Peter Smit 的评论,我使用了以下解决方案:
connection = psycopg2.connect(<url>)
cursor = connection.cursor()
cursor.execute(<some query>)
temp_today = 0
parsed_count = 0
cursor_count = cursor.rowcount
for row in fetch(cursor):
item = extract_variables(row)
date = item['datetime']
today = date.date()
if temp_today is 0:
# do something with first row
temp_today = date
elif parsed_count == cursor_count:
# do something with the last row
elif temp_today.date() == today:
# do something with every row where
# the date is the same
else:
# do something with every row where
# the dates ain't the same
我正在处理一个巨大的 postgresql 数据库,我为其创建了一个 "fetch" 函数。
def fetch(cursor, batch_size=1e3):
"""An iterator that uses fetchmany to keep memory usage down"""
while True:
records = cursor.fetchmany(int(batch_size))
if not records:
break
for record in records:
yield record
对于每个项目我都在做一些处理,但现在我有一个问题,在某些情况下最后一个项目会被省略,因为我正在对项目进行一些比较。一旦比较在最后一项上没有产生任何结果,就不会做任何事情。
connection = psycopg2.connect(<url>)
cursor = connection.cursor()
cursor.execute(<some query>)
temp_today = 0
for row in fetch(cursor):
item = extract_variables(row)
date = item['datetime']
today = date.date()
if temp_today is 0:
# do something with first row
temp_today = date
# -----------------------------------------
# I feel like I am missing a statement here
# something like:
# if row == rows[-1]:
# do something with last row..
# -----------------------------------------
elif temp_today.date() == today:
# do something with every row where
# the date is the same
else:
# do something with every row where
# the dates ain't the same
当我使用收益时,如何处理最后一项?
使用 yield 对我来说非常重要,因为我正在处理一个非常庞大的数据集,如果我不这样做,我会 运行 内存不足。
您可以定义另一个生成器,以便迭代返回的项目和前一个(如果有):
def pair( sequence):
previous = None
for item in sequence:
yield (item, previous)
previous = item
for item, previous_item in pair( mygenerator( args))
if previous_item is None:
# process item: first one returned
else:
# you can compare item and previous_item
感谢@Peter Smit 的评论,我使用了以下解决方案:
connection = psycopg2.connect(<url>)
cursor = connection.cursor()
cursor.execute(<some query>)
temp_today = 0
parsed_count = 0
cursor_count = cursor.rowcount
for row in fetch(cursor):
item = extract_variables(row)
date = item['datetime']
today = date.date()
if temp_today is 0:
# do something with first row
temp_today = date
elif parsed_count == cursor_count:
# do something with the last row
elif temp_today.date() == today:
# do something with every row where
# the date is the same
else:
# do something with every row where
# the dates ain't the same