列表列表的内存问题

Memory issues with a list of lists

我有一些内存问题,我想知道是否有任何方法可以释放下面代码中的一些内存。我曾尝试使用生成器表达式而不是列表推导式,但这不会产生唯一的组合,因为内存被释放了。

列表列表(组合)导致我 运行 内存不足,程序无法完成。

最终结果将是此列表中的 729 个列表,每个列表包含 6 个指向图像的 WindowsPath 元素。我试过将列表作为字符串存储在文本文件中,但我无法让它工作,我尝试使用 pandas dataframe 但我无法让它工作。

我需要想出一个不同的解决方案。现在的输出正是我需要的,但内存是唯一的问题。

from pathlib import Path
from random import choice
from itertools import product
from PIL import Image
import sys

def combine(arr):
    return list(product(*arr))

def generate(x):

    #set new value for name
    name = int(x)

    #Turn name into string for file name
    img_name = str(name)

    #Pick 1 random from each directory, add to list.
    a_paths = [choice(k) for k in layers]

    #if the length of the list of unique combinations is equal to the number of total combinations, this function stops
    if len(combinations) == len(combine(layers)):
        print("Done")
        sys.exit()

    else:
        #If combination exists, generate new list
        if any(j == a_paths for j in combinations) == True:
            print("Redo")
            generate(name)

        #Else, initialize new image, paste layers + save image, add combination to list, and generate new list
        else:
            #initialize image
            img = Image.new("RGBA", (648, 648))
            png_info = img.info

            #For each path in the list, paste on top of previous, sets image to be saved
            for path in a_paths:
                layer = Image.open(str(path), "r")
                img.paste(layer, (0, 0), layer)

            print(str(name) + ' - Unique')
            img.save(img_name + '.png', **png_info)
            combinations.append(a_paths)
            name = name - 1
            generate(name)

'''
Main method
'''
global layers
layers = [list(Path(directory).glob("*.png")) for directory in ("dir1/", "dir2/", "dir3/", "dir4/", "dir5/", "dir6/")]

#name will dictate the name of the file output(.png image) it is equal to the number of combinations of the image layers
global name
name = len(combine(layers))

#combinations is the list of lists that will store all unique combinations of images
global combinations
combinations = []

#calling recursive function
generate(name)

让我们从代码的 MRE 版本开始(即我可以 运行 不需要一堆 PNG 的东西——我们在这里关心的是如何在不使用递归的情况下浏览图像限制):

from random import choice
from itertools import product


def combine(arr):
    return list(product(*arr))


def generate(x):

    # set new value for name
    name = int(x)

    # Turn name into string for file name
    img_name = str(name)

    # Pick 1 random from each directory, add to list.
    a_paths = [choice(k) for k in layers]

    # if the length of the list of unique combinations is equal to the number of total combinations, this function stops
    if len(combinations) == len(combine(layers)):
        print("Done")
        return

    else:
        # If combination exists, generate new list
        if any(j == a_paths for j in combinations) == True:
            print("Redo")
            generate(name)

        # Else, initialize new image, paste layers + save image, add combination to list, and generate new list
        else:
            # initialize image
            img = []

            # For each path in the list, paste on top of previous, sets image to be saved
            for path in a_paths:
                img.append(path)

            print(str(name) + ' - Unique')
            print(img_name + '.png', img)
            combinations.append(a_paths)
            name = name - 1
            generate(name)


'''
Main method
'''
global layers
layers = [
    [f"{d}{f}.png" for f in ("foo", "bar", "baz", "ola", "qux")]
    for d in ("dir1/", "dir2/", "dir3/", "dir4/", "dir5/", "dir6/")
]


# name will dictate the name of the file output(.png image) it is equal to the number of combinations of the image layers
global name
name = len(combine(layers))

# combinations is the list of lists that will store all unique combinations of images
global combinations
combinations = []

# calling recursive function
generate(name)

当我 运行 时,我得到一些以以下内容开头的输出:

15625 - Unique
15625.png ['dir1/qux.png', 'dir2/bar.png', 'dir3/bar.png', 'dir4/foo.png', 'dir5/baz.png', 'dir6/foo.png']
15624 - Unique
15624.png ['dir1/baz.png', 'dir2/qux.png', 'dir3/foo.png', 'dir4/foo.png', 'dir5/foo.png', 'dir6/foo.png']
15623 - Unique
15623.png ['dir1/ola.png', 'dir2/qux.png', 'dir3/bar.png', 'dir4/ola.png', 'dir5/ola.png', 'dir6/bar.png']
...

并以 RecursionError 结尾。我想这就是你说“运行 内存不足”时的意思——实际上我似乎并没有接近 运行 内存不足(也许这会如果我有实际图像,行为会有所不同吗?),但是 Python 的堆栈深度是有限的,而且这个函数似乎在没有特别充分的理由的情况下任意深度地递归到自身。

由于您试图最终生成所有可能的组合,因此您已经有了一个非常好的解决方案,您甚至已经在使用它 -- itertools.product。您所要做的就是遍历它为您提供的组合。您不需要递归,也不需要全局变量。

from itertools import product
from typing import List


def generate(layers: List[List[str]]) -> None:
    for name, a_paths in enumerate(product(*layers), 1):
        # initialize image
        img = []

        # For each path in the list, paste on top of previous,
        # sets image to be saved
        for path in a_paths:
            img.append(path)

        print(f"{name} - Unique")
        print(f"{name}.png", img)

    print("Done")


'''
Main method
'''
layers = [
    [f"{d}{f}.png" for f in ("foo", "bar", "baz", "ola", "qux")]
    for d in ("dir1/", "dir2/", "dir3/", "dir4/", "dir5/", "dir6/")
]

# calling iterative function
generate(layers)

现在我们得到了所有的组合——命名从 1 开始一直到 15625:

1 - Unique
1.png ['dir1/foo.png', 'dir2/foo.png', 'dir3/foo.png', 'dir4/foo.png', 'dir5/foo.png', 'dir6/foo.png']
2 - Unique
2.png ['dir1/foo.png', 'dir2/foo.png', 'dir3/foo.png', 'dir4/foo.png', 'dir5/foo.png', 'dir6/bar.png']
3 - Unique
3.png ['dir1/foo.png', 'dir2/foo.png', 'dir3/foo.png', 'dir4/foo.png', 'dir5/foo.png', 'dir6/baz.png']
...
15623 - Unique
15623.png ['dir1/qux.png', 'dir2/qux.png', 'dir3/qux.png', 'dir4/qux.png', 'dir5/qux.png', 'dir6/baz.png']
15624 - Unique
15624.png ['dir1/qux.png', 'dir2/qux.png', 'dir3/qux.png', 'dir4/qux.png', 'dir5/qux.png', 'dir6/ola.png']
15625 - Unique
15625.png ['dir1/qux.png', 'dir2/qux.png', 'dir3/qux.png', 'dir4/qux.png', 'dir5/qux.png', 'dir6/qux.png']
Done

将实际的图像生成代码替换回我的模拟版本留作 reader 的练习。

如果您想运行控制组合的顺序,这样做是很合理的:

from random import shuffle

...

    combinations = list(product(*layers))
    shuffle(combinations)
    for name, a_paths in enumerate(combinations, 1):
        ...

这会使用更多内存(因为现在您正在构建产品的 list 而不是通过生成器进行迭代),但是您正在处理的图像数量实际上并没有那么大,所以只要您不为每个图像添加递归级别就可以了。