函数调用大参数的内存错误

Question

程序：

这是一个程序，它试图在给出一个句子的起始词列表 (seedBank) 和一个包含来自文本文件的信息的词对字典 (pairs) 后创建一个乱码语句按照哪个。

包含 'This is a cat. He is a dog.' 的 text.txt 文件的示例意味着我们将输入以下内容：

seedBank = ['This', 'He']

pairs = { 'This':['is'],'is':['a','a'],'a':['cat','dog'],'He':['is'] }

因此，该函数使用这些输入来创建一个随机生成的句子，该句子具有模糊的意义，因为它遵循半语法正确的格式。

def gibberish_sentence(seedBank, pairs):
    gibSentence = []
    gibSentence.append(random.choice(seedBank)) #random seed
    x = gibSentence[0]
    while(pairs.get(x)is not None): #Loop while value x is a key in the dictionairy
        y = random.choice(pairs.get(x)) #random value of key x
        gibSentence.append(y) #random value is added to main string
        x = y #key x is reset to y
    return ' '.join(gibSentence) #String

问题：

这个程序可以很好地传递像上面那个定义了一组 random.seed(value) 的小句子，但是它会失败并且 returns 在给定一组输入时出现内存错误（seedBank和对）是非常大的。因此，我的问题是这个程序有什么问题可能会导致它在处理更大的参数时出现问题？

请注意，这些参数实际上并不大，我没有文本文档，但它不会太大以至于没有足够的 RAM。

错误代码：

非常感谢。

已解决：谢谢！问题已解决，实际上是 while 条件导致了问题，这是因为它循环遍历整个文本，而不是在遇到带有句号或问号等的单词时才结束。本质上，这导致它过载记忆，但感谢大家的帮助！

Answer 1

没有你的实际 pairs 很难说，但是如果所有的词在某个时候都相互引用，则有可能出现无限循环：

pairs = { 'someone':['thinks'],'thinks':['that','how'],'that':['someone','anyone'],'how':['someone'], 'anyone': ['thinks'] }

永远不会完成。

Answer 2

加入字符串列表并不是最糟糕的，但就 space 效率而言并不是最好的。

考虑使用类似 StringIO 的东西（当然未经测试）：

from cStringIO import StringIO
import random

def gibberish_sentence(seedBank, pairs):
    seed = random.choice(seedBank)
    gibSentence = StringIO()
    gibSentence.write(seed)             #random seed
    gibSentence.write(' ')
    x = seed
    while(pairs.get(x) is not None):    #Loop while value x is a key in the dictionairy
        y = random.choice(pairs.get(x)) #random value of key x
        gibSentence.write(y)            #random value is added to main string
        gibSentence.write(' ')
        x = y                           #key x is reset to y
    return gibSentence.getvalue() #String

Here's a comparison 不同的字符串连接方法，在每秒操作数和内存消耗方面。

Answer 3

如 Tim Pietzcker 所述，如果 pairs 中存在循环，您的代码可能会永远循环。这是最基本的例子：

>>> seedBank = ['and']
>>> pairs = {'and': ['on'], 'on': ['and']}
>>> gibberish_sentence(seedBank, pairs)  # just keeps going

您可以通过修改 pairs 字典来确保生成的句子（最终）结束，这样当单词出现在句子的最后一个时，它就会包含一个标记值。例如对于像 'You and me and the dog.':

这样的源文本

seedBank = ['You']

pairs = {
    'You': ['and'],
    'and': ['me', 'the'],
    'me': ['and'],
    'the': ['dog'],
    'dog': ['.'],
}

... 并在 gibberish_sentence():

中添加对哨兵的检查

def gibberish_sentence(seedBank, pairs):
    gibSentence = []
    gibSentence.append(random.choice(seedBank)) #random seed
    x = gibSentence[0]
    while(pairs.get(x)is not None): #Loop while value x is a key in the dictionairy
        y = random.choice(pairs.get(x)) #random value of key x
        if y == '.':
            break
        gibSentence.append(y) #random value is added to main string
        x = y #key x is reset to y
    return ' '.join(gibSentence) #String

...这让句子有机会终止：

>>> gibberish_sentence(seedBank, pairs)
'You and the dog'
>>> gibberish_sentence(seedBank, pairs)
'You and me and me and me and me and me and the dog'
>>> gibberish_sentence(seedBank, pairs)
'You and me and the dog'

Answer 4

可以通过使用非常节省内存的生成器来避免构建列表。

def gibberish_sentence(seedBank, pairs):
    x = random.choice(seedBank)) #random seed
    yield x
    while(pairs.get(x)is not None): #Loop while value x is a key in the dictionairy
        y = random.choice(pairs.get(x)) #random value of key x
        yield y
        x = y #key x is reset to y

print ' '.join(gibberish_sentence(seedBank, pairs)) #String

或者字符串必须在函数中构建，可以这样完成，

def gibberish_sentence(seedBank, pairs):
    def words():
        x = random.choice(seedBank)) #random seed
        yield x
        while(pairs.get(x)is not None): #Loop while value x is a key in the dictionairy
        y = random.choice(pairs.get(x)) #random value of key x
        yield y
        x = y #key x is reset to y
    return ' '.join(words()) #String

函数调用大参数的内存错误

Memory Error with function calling large parameters

python

memory