Python 子进程循环运行两次

Question

因此，我创建了一个 Python 脚本来使用 Ghostscript 批量转换 PDF 文件。理想情况下它应该可以工作，但我不确定为什么它不起作用。现在，它会检查输入的 PDF 文件两次，当它第二次运行时，它会覆盖输出文件。

这是脚本。

from __future__ import print_function
import os
import subprocess

try:
   os.mkdir('compressed')
except FileExistsError:
   pass   

for root, dirs, files in os.walk("."):
   for file in files:
      if file.endswith(".pdf"):
         filename = os.path.join(root, file)
         arg1= '-sOutputFile=' + './compressed/' + file
         print ("compressing:", file )
         p = subprocess.Popen(['gs', '-sDEVICE=pdfwrite', '-dCompatibilityLevel=1.4', '-dPDFSETTINGS=/screen', '-dNOPAUSE', '-dBATCH',  '-dQUIET', str(arg1), filename], stdout=subprocess.PIPE).wait()

这是输出。

我不知道我做错了什么。

Answer 1

file 只是文件名。您在不同的目录中有多个名称相同的文件。不要忘记 os.walk 默认情况下在子目录中递归。

所以你必须将转换后的文件保存在一个目录或名称中，这取决于 root。

并将输出目录放在当前目录之外，因为 os.walk 将扫描它

例如，对于平面输出替换：

arg1= '-sOutputFile=' + './compressed/' + file

来自

arg1= '-sOutputFile=' + '/somewhere/else/compressed/' + root.strip(".").replace(os.sep,"_")+"_"+file

表达式

root.strip(".").replace(os.sep,"_")

应该创建一个 root 树的“平面”版本，没有当前目录（没有点）和转换为下划线的路径分隔符，加上最后一个下划线。这是一个可行的选择。

不会扫描 ./compressed 或任何其他子目录（可能更多您要查找的内容）的替代版本将使用 os.listdir 代替（无递归）

root = "."
for file in os.listdir(root):
  if file.endswith(".pdf"):
     filename = os.path.join(root, file)
     arg1= '-sOutputFile=' + './compressed/' + file
     print ("compressing:", file )

或os.scandir

root = "."
for entry in os.scandir(root):
  file = entry.name
  if file.endswith(".pdf"):
     filename = os.path.join(root, file)
     arg1= '-sOutputFile=' + './compressed/' + file
     print ("compressing:", file )

Answer 2

您的问题是 os.walk 还将检索“压缩”目录中的内容。这是因为文件将在 os.walk 列出该目录中的文件之前被压缩和创建。如果您将 print(os.path.join(root, file)) 添加到您的 for 循环中，您会注意到这一点。

Bellow 是一个有效的片段，因为检索到的文件只是当前目录中的文件。

import os

os.makedirs("compressed", exist_ok=True)

for file in os.listdir("."):
    if not os.path.isfile(file):
        continue
    if not file.endswith(".pdf"):
        continue
    print(file)

Answer 3

os.walk 将根据定义进入子目录，因此您正在第二次压缩 compressed 子目录中的文件。

可能你只是想要

for file in os.scandir("."):
   ...

顺便说一句，您几乎肯定想避免使用 Popen，转而使用 subprocess.run() 或其遗留变体之一。

Answer 4

在第一次迭代时 for root, dirs, files in os.walk(".") 您在当前目录中找到文件，然后将它们压缩到 ./compressed/*.pdf路径。

之后外循环的第二次迭代将在子目录中找到已经压缩的文件。

最简单的解决方法是将输出目录移动到输入目录之外（或者在 compressed 目录旁边创建一个 input 目录，然后从那里读取文件而不是 .)

Python 子进程循环运行两次

Python Subprocess Loop runs Twice

python

ghostscript

python-3.x