基于字典键（或 csv）中的多个值附加 pdf 文件会导致页面过多

Question

我正在尝试根据他们所在的县生成 pdf 文件。如果每个县有多个 pdf 文件，那么我需要根据县键将这些文件附加到一个文件中。我似乎无法根据键添加地图。生成的最终地图似乎是随机的，并且通常附加了太多文件。我很确定我没有正确地对它们进行分组。我读过一个键中的多个值会导致多次出现。有人可以告诉我如何单独访问每个键的每个值，一次吗？显然我不理解一些关键的东西。

我的代码：

import csv, os
import shutil
from PyPDF2 import PdfFileMerger, PdfFileReader, PdfFileWriter

merged_file = PdfFileMerger()
counties = {'County4': ['C:\maps\map2.pdf', 'C:\maps\map3.pdf', 'C:\maps\map4.pdf'], 'County1': ['C:\maps\map1.pdf', 'C:\maps\map2.pdf'], 'County3': ['C:\maps\map3.pdf'], 'County2': ['C:\maps\map1.pdf', 'C:\maps\map3.pdf']}
for k, v in counties.items():
    newPdfFile = ('C:\maps\JoinedMaps\k +'.pdf')
    if len(v) > 1:
        for filename in v:
            merged_file.append(PdfFileReader(filename,'rb'))
        merged_file.write(newPdfFile)
    else:
        for filename in v:
            shutil.copyfile(filename, newPdfFile)

我输出了四张地图（这是正确的），但是其中一些文件中 "pages"（附加文件）的数量非常少。据我所知，这些页面的附加方式没有任何韵律或原因。 County4 pdf 有 3 页（正确），County1 pdf 有 8 页而不是 2 页，County3 pdf 有 1 页（正确），County2 有 15 页而不是 2 页。

编辑：

事实证明 pyPDF2 不喜欢使用分组依据的概念迭代和创建文件。我想它与存储内存的方式有关。结果是在您遍历键值时创建越来越多的页面。我花了几天时间认为这是我的编码。很高兴知道这不是我猜的，但我很惊讶这条信息并没有 "out there on the internet" 更好。

我的解决方案是使用 arcpy，这对大多数用户阅读本文没有帮助，抱歉。

对于那些查看我的解决方案的人，我的 csv 文件如下所示：

County1   C:\maps\map1.pdf
County1   C:\maps\map2.pdf
County2   C:\maps\map1.pdf
County2   C:\maps\map3.pdf
County3   C:\maps\map3.pdf
County4   C:\maps\map2.pdf
County4   C:\maps\map3.pdf
County4   C:\maps\map4.pdf

我生成的 pdf 文件如下所示：

County-County1 (2 pages - Map1 and Map2)
County-County2 (2 pages - Map1 and Map3)
County-County3 (1 page - Map3)
County-County2 (3 pages - Map2, Map3, and Map4)

Answer 1

我的数据开始时是一个 csv 文件，下面的代码引用了这个而不是我在上面的例子中使用的字典（从 csv 文件生成），但你应该能够收集到我所做的基于下面的代码。我基本上放弃了字典的想法，然后逐行读取 csv 文件，然后使用 arcpy 进行附加。 pyPDF2 在尝试基于键输出多个文件时确实 NOT 正确合并。我生命中的三天我无法回来

import csv
import arcpy
from arcpy import env
import shutil, os, glob

# clear out files from destination directory
files = glob.glob(r'C:\maps\JoinedMaps\*')
for f in files:
    os.remove(f)

# open csv file
f = open("C:\maps\Maps.csv", "r+")
ff = csv.reader(f)

# set variable to establish previous row of csv file (for comaprrison)
pre_line = ff.next()

# Iterate through csv file

for cur_line in ff:
    # new file name and location based on value in column (county name)
    newPdfFile = (r'C:\maps\JoinedMaps\County-' + cur_line[0] +'.pdf')
    # establish pdf files to be appended
    joinFile = pre_line[1]
    appendFile = cur_line[1]

    # If columns in both rows match
    if pre_line[0] == cur_line[0]: # <-- compare first column
        # If destnation file already exists, append file referenced in current row
        if os.path.exists(newPdfFile):
            tempPdfDoc = arcpy.mapping.PDFDocumentOpen(newPdfFile)
            tempPdfDoc.appendPages(appendFile)
        # Otherwise create destination and append files reference in both the previous and current row
        else:
            tempPdfDoc = arcpy.mapping.PDFDocumentCreate(newPdfFile)
            tempPdfDoc.appendPages(joinFile)
            tempPdfDoc.appendPages(appendFile)
        # save and delete temp file
        tempPdfDoc.saveAndClose()
        del tempPdfDoc
    else:
        # if no match, do not merge, just copy
        shutil.copyfile(appendFile,newPdfFile)

    # reset variable
    pre_line = cur_line

基于字典键（或 csv）中的多个值附加 pdf 文件会导致页面过多

Appending pdf files based multilpe values in a dictionary key (or csv) results in too many pages

python

csv

pypdf2