如何在没有运行 for 循环的情况下使 class return 的 iter 方法成为一个值？

Question

我有一个 class，它有一个 __iter__ 方法，它是这样的

class Mycorpus:
    
    '''This class helps us to train the model without loading the whole dataset to the RAM.'''
    
    def __init__(self, filepath= text_file):
        self.filepath = filepath
        
    def __iter__(self):
        with open(self.filepath,'r') as rfile:
            csv_reader = csv.DictReader(rfile, delimiter=',')
            for row in csv_reader:
        
                # splitter splits the conversation into client and agent part
                client_convo, agent_convo = convo_split.splitter(row['Combined'])

                client_tokens = preprocess(client_convo)
                agent_tokens = preprocess(agent_convo)
                
                yield client_tokens

我将此对象传递给一个函数，该函数需要此对象在迭代时一次 return 一组标记。即 client_tokens 或 agent_tokens。我希望 __iter__ 产生一个 client_tokens，并在下一次迭代中产生来自同一客户端代理对的 agent_tokens。我不想同时生成两组令牌，因为它会破坏功能。一次只有一个。我在这里的主要 objective 是为了避免循环遍历文件两次并再次对相同的对话使用拆分器功能。

我试过做如下的事情。

def __init__(self, filepath= text_file):
        self.filepath = filepath
        self.agent_turn = 0

def __iter__(self):
        with open(self.filepath,'r') as rfile:
            csv_reader = csv.DictReader(rfile, delimiter=',')
 
            if self.agent_turn:
                self.agent_turn = 0
                yield agent_tokens
            
            else:
                for row in csv_reader:
                
                    # splitter splits the conversation into client and agent part
                    client_convo, agent_convo = convo_split.splitter(row['Combined'])

                    client_tokens = preprocess(client_convo)
                    agent_tokens = preprocess(agent_convo)
                    self.agent_turn = 1
                    yield client_tokens

但是上面的代码只给出了client_tokens。有没有更好的方法可以在不使用整个数据集到内存的情况下做到这一点？我的要求甚至可以使用 __iter__ 方法吗？非常感谢任何帮助或指导。

Answer 1

您使用了两个 yield 语句，正如许多示例向您展示的那样。请记住，生成器/迭代器在 yield 语句之后重新进入，而不是在函数的顶部。

        for row in csv_reader:
    
            # splitter splits the conversation into client and agent part
            client_convo, agent_convo = convo_split.splitter(row['Combined'])

            client_tokens = preprocess(client_convo)
            agent_tokens = preprocess(agent_convo)
            
            yield client_tokens
            yield agent_tokens

如何在没有运行 for 循环的情况下使 class return 的 iter 方法成为一个值？

How to make iter method of a class return a value without running the for loop?

python

iteration

memory-management

yield

dataframe

如何在没有 运行 for 循环的情况下使 class return 的 iter 方法成为一个值？

How to make iter method of a class return a value without running the for loop?

python

iteration

memory-management

yield

dataframe

如何在没有运行 for 循环的情况下使 class return 的 iter 方法成为一个值？