python 保存用于校验和的迭代和子进程的输出

Question

此脚本的目的是从目录的每个文件中提取 md5 校验和作为源，然后（我也在处理）在目标上执行脚本以验证它是否已正确复制。

#!/usr/bin/env python

import os
from sys import *
import subprocess


script, path = argv

destination = "./new_directorio/"
archivo = "cksum.txt"


def checa_sum(x):
        ck = "md5 %s" % x
        p = subprocess.Popen(ck, stdout=subprocess.PIPE, shell=True)
        (output, err) = p.communicate()

        out = open(archivo,'w')
        out.write("%s" % (output))
        out.close()

files = [f for f in os.listdir(path) if os.path.isfile(f)]
for i in files:
        if not "~" in i:
                checa_sum(i)

给我的是一个文件，名为："cksum.txt" 但文件中只有一个结果。

bash-3.2$ more cksum.txt
MD5 (victor) = 4703ee63236a6975abab75664759dc29
bash-3.2$

另一个尝试，而不是 "open"、"write"、"close" 结构使用以下：

def checa_sum(x):
            ck = "md5 %s" % x
            p = subprocess.Popen(ck, stdout=subprocess.PIPE, shell=True)
            (output, err) = p.communicate()

             with open(archivo,'w') as outfile:
                   outfile.write(output)

当我期望文件中有以下结果时，为什么只给我一个结果？：

MD5 (pysysinfo.py) = 61a532c898e6f461ef029cee9d1b63dd

MD5 (pysysinfo_func.py) = ac7a1c1c43b2c5e20ceced5ffdecee86

MD5 (pysysinfo_new.py) = 38b06bac21af3d08662d00fd30f6c329

MD5 (test) = b2b0c958ece30c119bd99837720ffde1

MD5 (test_2.py) = 694fb14d86c573fabda678b9d770e51a

MD5 (uno.txt) = 466c9f9d6a879873688b000f7cbe758d

MD5 (victor) = 4703ee63236a6975abab75664759dc29

此外，我不知道如何处理每次迭代之间的 space。我也在找那个。

有了这个之后，我将比较每个项目以验证一旦复制到目的地的完整性。

Answer 1

你继续用w打开并覆盖，用a打开追加。

最好的方法是简单地将 stdout 重定向到一个文件对象，例如：

def checa_sum(x):
    with open(archivo,'a') as outfile:
        check_call(["md5",x], stdout=outfile)

使用 check_call 将引发 CalledProcessError 非零退出状态，您应该相应地处理它。

捕获异常：

  try:
     check_call(["md5sum", x], stdout=outfile)
  except CalledProcessError as e:
     print("Exception for {}".format(e.cmd))

使用生成器表达式获取文件，如果您想忽略副本，请使用 not f.endswith("~"):

files = (f for f in os.listdir("/home/padraic") if os.path.isfile(f) and not f.endswith("~"))
for i in files:
    checa_sum(i)

Answer 2

啊，有人问替代方案，当然有:)

import logging
import hashlib
import os
outfile = "hash.log"
indir = "/Users/daniel/Sites/work"
logging.basicConfig(filename=outfile, filemode="w", format='%(message)s', level=logging.DEBUG)
for filename in (file for file in os.listdir(indir) if os.path.isfile(file) and not file.endswith("~")):
    with open(filename) as checkfile:
        logging.info(hashlib.md5(checkfile.read()).hexdigest())

我以前用过类似的东西。

我喜欢使用 logging 模块，因为它使事情可扩展，我不必保持文件打开或继续打开它。记录器是高度可配置的，但是为了生成这里需要的东西，简单的设置是一个衬里。

这里我没有做任何控制台解析，因为我正在使用 pythons hashlib 来生成文件 md5。现在可以说，这样做可能会减慢速度，但至少对于我经常遇到的文件大小，到目前为止我没有遇到任何问题。

在较大的文件上进行测试会很有趣，否则日志记录机制也可以用于您的情况。那时我只喜欢 hashlib，因为我不喜欢解析控制台输出。

python 保存用于校验和的迭代和子进程的输出

python saving output from a for iteration and subprocess for checksum

python

subprocess

md5

for-loop