Python 3 生成器函数返回相同的值

Python 3 Generator Function Returning the Same Value

我正在尝试构建一个批处理生成器,它将一个大的 Pandas DataFrame 作为输入和输出作为给定行数 (batch_size)。我在 10 行的较小数据框上进行了练习以使其正常工作。我在使用生成器函数时遇到问题,下面的 for 循环在练习数据帧上运行良好,并吐出指定的批量大小:

for i in range(0, len(df), 3):
lower = i
upper = i+3
print(df.iloc[lower:upper])

但是,事实证明,尝试将其构建到生成器函数中很困难:

def Generator(batch_size, seed = None):
num_items = len(df)
x = df.sample(frac = 1, replace = False, random_state = seed)
for offset in range(0, num_items, batch_size):
    lower_limit = offset
    upper_limit = offset+batch_size
    batch = x.iloc[lower_limit:upper_limit]
    yield batch

不幸的是:

next(Generator(e.g.1))

returns 一遍又一遍的同一行

我对使用它还很陌生,我觉得我一定遗漏了什么,但是,我找不到什么。 如果有人能指出可能是什么问题,我将不胜感激。

编辑: 数据框是预定义的,它是:

raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy', 'Sarah', 'Gueniva', 'Know', 'Sara', 'Cat'], 
    'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Mornig', 'Jaker', 'Alom', 'Ormon', 'Koozer'], 
    'age': [42, 52, 36, 24, 73, 53, 26, 72, 73, 24], 
    'preTestScore': [4, 24, 31, 2, 3, 13, 52, 72, 26, 26],
    'postTestScore': [25, 94, 57, 62, 70, 82, 52, 56, 234, 254]}
df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age', 'preTestScore', 'postTestScore'])
df

根据调用生成器的结果创建一个迭代器,next() 该迭代器。否则,如果您提供种子,则为可能具有相同 "first line" 的生成器重新创建新的生成器 "states"。

修复缩进问题后,它可以正常工作:

import pandas as pd

# I dislike variable scope bleeding into the function, provide df explicitly
def Generator(df, batch_size, seed = None):
    num_items = len(df)
    x = df.sample(frac = 1, replace = False, random_state = seed)
    for offset in range(0, num_items, batch_size):
        lower_limit = offset
        upper_limit = offset+batch_size
        batch = x.iloc[lower_limit:upper_limit]
        yield batch


raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy', 'Sarah', 
                           'Gueniva', 'Know', 'Sara', 'Cat'], 
    'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Mornig', 
                  'Jaker', 'Alom', 'Ormon', 'Koozer'], 
    'age': [42, 52, 36, 24, 73, 53, 26, 72, 73, 24], 
    'preTestScore': [4, 24, 31, 2, 3, 13, 52, 72, 26, 26],
    'postTestScore': [25, 94, 57, 62, 70, 82, 52, 56, 234, 254]}

df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age', 
                                       'preTestScore', 'postTestScore'])


# capture a "state" for the generator function
i = iter(Generator(df, 2)) 

# get the next states from the iterator and print
print(next(i))
print(next(i))
print(next(i))

输出:

  first_name last_name  age  preTestScore  postTestScore
8       Sara     Ormon   73            26            234
6    Gueniva     Jaker   26            52             52


  first_name last_name  age  preTestScore  postTestScore
5      Sarah    Mornig   53            13             82
9        Cat    Koozer   24            26            254

  first_name last_name  age  preTestScore  postTestScore
1      Molly  Jacobson   52            24             94
2       Tina       Ali   36            31             57

或者你可以这样做:

k = Generator(df, 1)
print(next(k))
print(next(k))
print(next(k))

同样有效。

如果你这样做

print(next(Generator(df, 2)))    
print(next(Generator(df, 2)))
print(next(Generator(df, 2)))

您创建了三个单独的打乱的 df`s,它们可能向您显示相同的行,因为您只打印了它的第一个 "iteration",然后它被丢弃了