Python 合并多个图像的代码并行化

Question

我是 Python 的新手，我正在尝试并行化我以某种方式从 Internet 拼凑起来的程序。该程序读取特定文件夹中的所有图像文件（通常是多个系列的图像，如 abc001、abc002...abc015 和 xyz001、xyz002...xyz015），然后组合指定范围内的图像。大多数时候，文件数量超过 10000，而我最近的案例需要我合并 24000 张图像。有人可以帮我：

从不同的目录中获取 2 组图像。目前我必须将这些图像移动到 1 个目录，然后在该目录中工作。
只读取指定的文件。目前我的程序读取所有文件，将名称保存在一个数组中（我认为它是一个数组。也可以是一个目录）然后仅使用组合所需的图像。如果我指定了一系列文件，它仍然会检查目录中的所有文件并花费很多时间。
并行处理 - 我通常处理 10k 个文件，有时甚至更多。这些是我运行在特定时间从流体模拟中保存的图像。目前，我一次将大约 2k 个文件保存在单独的文件夹中，运行程序将这些 2000 个文件一次合并。然后我将所有输出文件复制到一个单独的文件夹中以将它们放在一起。如果我可以使用处理器上的所有 16 个内核来一次合并所有文件，那就太好了。

Image series 1 is like so. 将其视为猫走向相机的一系列照片。每帧都带有后缀 001,002,...,n.

Image series 1 is like so. 将其视为猫的表情随每一帧而变化的一系列照片。每帧都带有后缀 001,002,...,n.

代码当前结合了 set1 和 set2 中的每个帧以提供 output.png，如 link here.

中所示

import sys
import os
from PIL import Image

keywords=input('Enter initial characters of image series 1    [Ex:Scalar_ , VoF_Scene_]:\n')
keywords2=input('Enter initial characters of image series 2    [Ex:Scalar_ , VoF_Scene_]:\n')

directory = input('Enter correct folder name where images are present   :\n')  # FOLDER WHERE IMAGES ARE LOCATED

result1 = {}  
result2={}

name_count1=0
name_count2=0
for filename in os.listdir(directory):
    if keywords in filename:
        name_count1 +=1
        result1[name_count1] = os.path.join(directory, filename)
    if keywords2 in filename:
        name_count2 +=1
        result2[name_count2] = os.path.join(directory, filename)

num1=input('Enter initial number of series:\n')
num2=input('Enter final number of series:\n')


num1=int(num1)
num2=int(num2)

if name_count1==(num2-num1+1):
    a1=1
    a2=name_count1
elif name_count2==(num2-num1+1):
    a1=1
    a2=name_count2
else:
    a1=num1
    a2=num2+1

for x in range(a1,a2):
    y=format(x,'05')        # '05' signifies number of digits in the series of file name Ex: [Scalar_scene_1_00345.png --> 5 digits], [Temperature_section_2_951.jpg --> 3 digits]. Change accordingly 
    y=str(y)
    for comparison_name1 in result1:
        for comparison_name2 in result2:
            test1=result1[comparison_name1]
            test2=result2[comparison_name2]
            if y in test1 and y in test2:
                a=test1
                b=test2
                test=[a,b]
                images = [Image.open(x) for x in test]
                widths, heights = zip(*(i.size for i in images))
                total_width = sum(widths)
                max_height = max(heights)

                new_im = Image.new('RGB', (total_width, max_height))

                x_offset = 0
                for im in images:
                    new_im.paste(im, (x_offset,0))
                    x_offset += im.size[0]
                    output_name='output'+y+'.png'
                    new_im.save(os.path.join(directory, output_name))

Answer 1

如果没有 Python，并且使用 ImageMagick 或 libvips 的多处理，您可以更快地完成此操作。

第一部分全部设置：

制作 20 张图像，称为 a-000.png ... a-019.png 从红色到蓝色：

convert -size 64x64 xc:red xc:blue -morph 18 a-%03d.png

制作 20 张图像，称为 b-000.png ... b-019.png，从黄色变为洋红色：

convert -size 64x64 xc:yellow xc:magenta -morph 18 b-%03d.png

现在将它们并排添加到 c-000.png ... c-019.png

for ((f=0;f<20;f++))
do
    z=$(printf "%03d" $f)
    convert a-${z}.png b-${z}.png +append c-${z}.png
done

这些图像看起来像这样：

如果看起来不错，您可以使用 GNU Parallel:

并行完成它们

parallel convert a-{}.png b-{}.png +append c-{}.png ::: {1..19}

基准

我做了一个快速基准测试，制作了 20,000 张图像 a-00000.png...a-019999.png 和另外 20,000 张图像 b-00000.png...b-019999.png，每张图像 1200x800 像素。然后我运行以下命令水平附加每一对并写入 20,000 个输出图像 c-00000.png...c-019999.png:

seq -f "%05g" 0 19999 | parallel --eta convert a-{}.png b-{}.png +append c-{}.png

这在我的 MacBook Pro 上需要 16 分钟，所有 12 个 CPU 内核始终保持在 100%。请注意，您可以：

在图像之间添加间隔，
在图像上写注释，
添加边框，
调整大小

如果您愿意并进行很多其他处理 - 这只是一个简单的示例。

另请注意，如果您接受 JPEG 而不是 PNG 作为输出格式，您可以获得更快的时间 - 在 10-12 分钟左右。

Answer 2

我也做了一个 Python 版本，它不是那么快，但它可能更贴近你的心:-)

#!/usr/bin/env python3

import cv2
import numpy as np
from multiprocessing import Pool

def doOne(params):
    """Append the two input images side-by-side to output the third."""
    imA = cv2.imread(params[0], cv2.IMREAD_UNCHANGED)
    imB = cv2.imread(params[1], cv2.IMREAD_UNCHANGED)
    res = np.hstack((imA, imB))
    cv2.imwrite(params[2], res) 


if __name__ == '__main__':

    # Build the list of jobs - each entry is a tuple with 2 input filenames and an output filename
    jobList = []
    for i in range(1000):
       # Horizontally append a-XXXXX.png to b-XXXXX.png to make c-XXXXX.png
       jobList.append( (f'a-{i:05d}.png', f'b-{i:05d}.png', f'c-{i:05d}.png') )

    # Make a pool of processes - 1 per CPU core    
    with Pool() as pool:
        # Map the list of jobs to the pool of processes
        pool.map(doOne, jobList)

Answer 3

您可以使用 libvips 更快地完成此操作。要左右合并两张图片，请输入：

vips join left.png out.png result.png horizontal

为了测试，我制作了 200 对 1200x800 的 PNG，如下所示：

for i in {1..200}; do cp x.png left$i.png; cp x.png right$i.png; done

然后尝试了一个基准：

time parallel vips join left{}.png right{}.png result{}.png horizontal ::: {1..200}
real    0m42.662s
user    2m35.983s
sys 0m6.446s

我在同一台笔记本电脑上使用 imagemagick：

time parallel convert left{}.png right{}.png +append result{}.png ::: {1..200}
real    0m55.088s
user    3m24.556s
sys 0m6.400s

Python 合并多个图像的代码并行化

Python parallelization for code to combine multiple images

parallel-processing

image-processing

python-3.6