Python 3 生成器函数返回相同的值
Python 3 Generator Function Returning the Same Value
我正在尝试构建一个批处理生成器,它将一个大的 Pandas DataFrame 作为输入和输出作为给定行数 (batch_size)。我在 10 行的较小数据框上进行了练习以使其正常工作。我在使用生成器函数时遇到问题,下面的 for 循环在练习数据帧上运行良好,并吐出指定的批量大小:
for i in range(0, len(df), 3):
lower = i
upper = i+3
print(df.iloc[lower:upper])
但是,事实证明,尝试将其构建到生成器函数中很困难:
def Generator(batch_size, seed = None):
num_items = len(df)
x = df.sample(frac = 1, replace = False, random_state = seed)
for offset in range(0, num_items, batch_size):
lower_limit = offset
upper_limit = offset+batch_size
batch = x.iloc[lower_limit:upper_limit]
yield batch
不幸的是:
next(Generator(e.g.1))
returns 一遍又一遍的同一行
我对使用它还很陌生,我觉得我一定遗漏了什么,但是,我找不到什么。
如果有人能指出可能是什么问题,我将不胜感激。
编辑:
数据框是预定义的,它是:
raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy', 'Sarah', 'Gueniva', 'Know', 'Sara', 'Cat'],
'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Mornig', 'Jaker', 'Alom', 'Ormon', 'Koozer'],
'age': [42, 52, 36, 24, 73, 53, 26, 72, 73, 24],
'preTestScore': [4, 24, 31, 2, 3, 13, 52, 72, 26, 26],
'postTestScore': [25, 94, 57, 62, 70, 82, 52, 56, 234, 254]}
df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age', 'preTestScore', 'postTestScore'])
df
根据调用生成器的结果创建一个迭代器,next()
该迭代器。否则,如果您提供种子,则为可能具有相同 "first line" 的生成器重新创建新的生成器 "states"。
修复缩进问题后,它可以正常工作:
import pandas as pd
# I dislike variable scope bleeding into the function, provide df explicitly
def Generator(df, batch_size, seed = None):
num_items = len(df)
x = df.sample(frac = 1, replace = False, random_state = seed)
for offset in range(0, num_items, batch_size):
lower_limit = offset
upper_limit = offset+batch_size
batch = x.iloc[lower_limit:upper_limit]
yield batch
raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy', 'Sarah',
'Gueniva', 'Know', 'Sara', 'Cat'],
'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Mornig',
'Jaker', 'Alom', 'Ormon', 'Koozer'],
'age': [42, 52, 36, 24, 73, 53, 26, 72, 73, 24],
'preTestScore': [4, 24, 31, 2, 3, 13, 52, 72, 26, 26],
'postTestScore': [25, 94, 57, 62, 70, 82, 52, 56, 234, 254]}
df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age',
'preTestScore', 'postTestScore'])
# capture a "state" for the generator function
i = iter(Generator(df, 2))
# get the next states from the iterator and print
print(next(i))
print(next(i))
print(next(i))
输出:
first_name last_name age preTestScore postTestScore
8 Sara Ormon 73 26 234
6 Gueniva Jaker 26 52 52
first_name last_name age preTestScore postTestScore
5 Sarah Mornig 53 13 82
9 Cat Koozer 24 26 254
first_name last_name age preTestScore postTestScore
1 Molly Jacobson 52 24 94
2 Tina Ali 36 31 57
或者你可以这样做:
k = Generator(df, 1)
print(next(k))
print(next(k))
print(next(k))
同样有效。
如果你这样做
print(next(Generator(df, 2)))
print(next(Generator(df, 2)))
print(next(Generator(df, 2)))
您创建了三个单独的打乱的 df`s,它们可能向您显示相同的行,因为您只打印了它的第一个 "iteration",然后它被丢弃了
我正在尝试构建一个批处理生成器,它将一个大的 Pandas DataFrame 作为输入和输出作为给定行数 (batch_size)。我在 10 行的较小数据框上进行了练习以使其正常工作。我在使用生成器函数时遇到问题,下面的 for 循环在练习数据帧上运行良好,并吐出指定的批量大小:
for i in range(0, len(df), 3):
lower = i
upper = i+3
print(df.iloc[lower:upper])
但是,事实证明,尝试将其构建到生成器函数中很困难:
def Generator(batch_size, seed = None):
num_items = len(df)
x = df.sample(frac = 1, replace = False, random_state = seed)
for offset in range(0, num_items, batch_size):
lower_limit = offset
upper_limit = offset+batch_size
batch = x.iloc[lower_limit:upper_limit]
yield batch
不幸的是:
next(Generator(e.g.1))
returns 一遍又一遍的同一行
我对使用它还很陌生,我觉得我一定遗漏了什么,但是,我找不到什么。 如果有人能指出可能是什么问题,我将不胜感激。
编辑: 数据框是预定义的,它是:
raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy', 'Sarah', 'Gueniva', 'Know', 'Sara', 'Cat'],
'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Mornig', 'Jaker', 'Alom', 'Ormon', 'Koozer'],
'age': [42, 52, 36, 24, 73, 53, 26, 72, 73, 24],
'preTestScore': [4, 24, 31, 2, 3, 13, 52, 72, 26, 26],
'postTestScore': [25, 94, 57, 62, 70, 82, 52, 56, 234, 254]}
df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age', 'preTestScore', 'postTestScore'])
df
根据调用生成器的结果创建一个迭代器,next()
该迭代器。否则,如果您提供种子,则为可能具有相同 "first line" 的生成器重新创建新的生成器 "states"。
修复缩进问题后,它可以正常工作:
import pandas as pd
# I dislike variable scope bleeding into the function, provide df explicitly
def Generator(df, batch_size, seed = None):
num_items = len(df)
x = df.sample(frac = 1, replace = False, random_state = seed)
for offset in range(0, num_items, batch_size):
lower_limit = offset
upper_limit = offset+batch_size
batch = x.iloc[lower_limit:upper_limit]
yield batch
raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy', 'Sarah',
'Gueniva', 'Know', 'Sara', 'Cat'],
'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Mornig',
'Jaker', 'Alom', 'Ormon', 'Koozer'],
'age': [42, 52, 36, 24, 73, 53, 26, 72, 73, 24],
'preTestScore': [4, 24, 31, 2, 3, 13, 52, 72, 26, 26],
'postTestScore': [25, 94, 57, 62, 70, 82, 52, 56, 234, 254]}
df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age',
'preTestScore', 'postTestScore'])
# capture a "state" for the generator function
i = iter(Generator(df, 2))
# get the next states from the iterator and print
print(next(i))
print(next(i))
print(next(i))
输出:
first_name last_name age preTestScore postTestScore
8 Sara Ormon 73 26 234
6 Gueniva Jaker 26 52 52
first_name last_name age preTestScore postTestScore
5 Sarah Mornig 53 13 82
9 Cat Koozer 24 26 254
first_name last_name age preTestScore postTestScore
1 Molly Jacobson 52 24 94
2 Tina Ali 36 31 57
或者你可以这样做:
k = Generator(df, 1)
print(next(k))
print(next(k))
print(next(k))
同样有效。
如果你这样做
print(next(Generator(df, 2)))
print(next(Generator(df, 2)))
print(next(Generator(df, 2)))
您创建了三个单独的打乱的 df`s,它们可能向您显示相同的行,因为您只打印了它的第一个 "iteration",然后它被丢弃了