Python: 不断检查添加到列表中的文件的大小,在大小处停止,压缩列表,继续
Python: Continuously check size of files being added to list, stop at size, zip list, continue
我正在尝试遍历目录,检查每个文件的大小,然后将文件添加到列表中,直到它们达到特定大小 (2040 MB)。那时,我想将列表放入一个 zip 存档中,然后继续循环访问目录中的下一组文件并继续做同样的事情。另一个限制是具有相同名称但不同扩展名的文件需要一起添加到 zip 中,并且不能分开。我希望这是有道理的。
我遇到的问题是我的代码基本上忽略了我添加的大小限制,并且只是压缩了目录中的所有文件。
我怀疑存在一些逻辑问题,但我没有发现。任何帮助,将不胜感激。这是我的代码:
import os,os.path, zipfile
from time import *
#### Function to create zip file ####
# Add the files from the list to the zip archive
def zipFunction(zipList):
# Specify zip archive output location and file name
zipName = "D:\Documents\ziptest1.zip"
# Create the zip file object
zipA = zipfile.ZipFile(zipName, "w", allowZip64=True)
# Go through the list and add files to the zip archive
for w in zipList:
# Create the arcname parameter for the .write method. Otherwise the zip file
# mirrors the directory structure within the zip archive (annoying).
arcname = w[len(root)+1:]
# Write the files to a zip
zipA.write(w, arcname, zipfile.ZIP_DEFLATED)
# Close the zip process
zipA.close()
return
#################################################
#################################################
sTime = clock()
# Set the size counter
totalSize = 0
# Create an empty list for adding files to count MB and make zip file
zipList = []
tifList = []
xmlList = []
# Specify the directory to look at
searchDirectory = "Y:\test"
# Create a counter to check number of files
count = 0
# Set the root, directory, and file name
for root,direc,f in os.walk(searchDirectory):
#Go through the files in directory
for name in f:
# Set the os.path file root and name
full = os.path.join(root,name)
# Split the file name from the file extension
n, ext = os.path.splitext(name)
# Get size of each file in directory, size is obtained in BYTES
fileSize = os.path.getsize(full)
# Add up the total sizes for all the files in the directory
totalSize += fileSize
# Convert from bytes to megabytes
# 1 kilobyte = 1,024 bytes
# 1 megabyte = 1,048,576 bytes
# 1 gigabyte = 1,073,741,824 bytes
megabytes = float(totalSize)/float(1048576)
if ext == ".tif": # should be everything that is not equal to XML (could be TIF, PDF, etc.) need to fix this later
tifList.append(n)#, fileSize/1048576])
tifSorted = sorted(tifList)
elif ext == ".xml":
xmlList.append(n)#, fileSize/1048576])
xmlSorted = sorted(xmlList)
if full.endswith(".xml") or full.endswith(".tif"):
zipList.append(full)
count +=1
if megabytes == 2040 and len(tifList) == len(xmlList):
zipFunction(zipList)
else:
continue
eTime = clock()
elapsedTime = eTime - sTime
print "Run time is %s seconds"%(elapsedTime)
我唯一能想到的是,我的变量 megabytes==2040
从来没有一个实例。不过,我不知道如何让代码在那一点停止;我想知道使用范围是否可行?我也试过:
if megabytes < 2040:
zipList.append(full)
continue
elif megabytes == 2040:
zipFunction(zipList)
您的主要问题是在归档当前文件列表时需要重置文件大小计数。例如
if megabytes >= 2040:
zipFunction(zipList)
totalSize = 0
顺便说一句,你不需要
else:
continue
到此为止,因为这是循环的结尾。
至于需要将主文件名相同但扩展名不同的文件放在一起的限制,唯一简单的方法是在处理文件名之前对文件名进行排序。
如果您想保证每个存档中的总文件大小都在限制以下,您需要在将文件添加到列表之前测试大小。例如,
if (totalSize + fileSize) // 1048576 > 2040:
zipFunction(zipList)
totalsize = 0
totalSize += fileSize
需要稍微修改该逻辑以处理将一组文件保存在一起的问题:您需要将组中每个文件的文件大小一起添加到小计中,然后查看是否添加该小计-总计 totalSize
超出限制。
我正在尝试遍历目录,检查每个文件的大小,然后将文件添加到列表中,直到它们达到特定大小 (2040 MB)。那时,我想将列表放入一个 zip 存档中,然后继续循环访问目录中的下一组文件并继续做同样的事情。另一个限制是具有相同名称但不同扩展名的文件需要一起添加到 zip 中,并且不能分开。我希望这是有道理的。
我遇到的问题是我的代码基本上忽略了我添加的大小限制,并且只是压缩了目录中的所有文件。
我怀疑存在一些逻辑问题,但我没有发现。任何帮助,将不胜感激。这是我的代码:
import os,os.path, zipfile
from time import *
#### Function to create zip file ####
# Add the files from the list to the zip archive
def zipFunction(zipList):
# Specify zip archive output location and file name
zipName = "D:\Documents\ziptest1.zip"
# Create the zip file object
zipA = zipfile.ZipFile(zipName, "w", allowZip64=True)
# Go through the list and add files to the zip archive
for w in zipList:
# Create the arcname parameter for the .write method. Otherwise the zip file
# mirrors the directory structure within the zip archive (annoying).
arcname = w[len(root)+1:]
# Write the files to a zip
zipA.write(w, arcname, zipfile.ZIP_DEFLATED)
# Close the zip process
zipA.close()
return
#################################################
#################################################
sTime = clock()
# Set the size counter
totalSize = 0
# Create an empty list for adding files to count MB and make zip file
zipList = []
tifList = []
xmlList = []
# Specify the directory to look at
searchDirectory = "Y:\test"
# Create a counter to check number of files
count = 0
# Set the root, directory, and file name
for root,direc,f in os.walk(searchDirectory):
#Go through the files in directory
for name in f:
# Set the os.path file root and name
full = os.path.join(root,name)
# Split the file name from the file extension
n, ext = os.path.splitext(name)
# Get size of each file in directory, size is obtained in BYTES
fileSize = os.path.getsize(full)
# Add up the total sizes for all the files in the directory
totalSize += fileSize
# Convert from bytes to megabytes
# 1 kilobyte = 1,024 bytes
# 1 megabyte = 1,048,576 bytes
# 1 gigabyte = 1,073,741,824 bytes
megabytes = float(totalSize)/float(1048576)
if ext == ".tif": # should be everything that is not equal to XML (could be TIF, PDF, etc.) need to fix this later
tifList.append(n)#, fileSize/1048576])
tifSorted = sorted(tifList)
elif ext == ".xml":
xmlList.append(n)#, fileSize/1048576])
xmlSorted = sorted(xmlList)
if full.endswith(".xml") or full.endswith(".tif"):
zipList.append(full)
count +=1
if megabytes == 2040 and len(tifList) == len(xmlList):
zipFunction(zipList)
else:
continue
eTime = clock()
elapsedTime = eTime - sTime
print "Run time is %s seconds"%(elapsedTime)
我唯一能想到的是,我的变量 megabytes==2040
从来没有一个实例。不过,我不知道如何让代码在那一点停止;我想知道使用范围是否可行?我也试过:
if megabytes < 2040:
zipList.append(full)
continue
elif megabytes == 2040:
zipFunction(zipList)
您的主要问题是在归档当前文件列表时需要重置文件大小计数。例如
if megabytes >= 2040:
zipFunction(zipList)
totalSize = 0
顺便说一句,你不需要
else:
continue
到此为止,因为这是循环的结尾。
至于需要将主文件名相同但扩展名不同的文件放在一起的限制,唯一简单的方法是在处理文件名之前对文件名进行排序。
如果您想保证每个存档中的总文件大小都在限制以下,您需要在将文件添加到列表之前测试大小。例如,
if (totalSize + fileSize) // 1048576 > 2040:
zipFunction(zipList)
totalsize = 0
totalSize += fileSize
需要稍微修改该逻辑以处理将一组文件保存在一起的问题:您需要将组中每个文件的文件大小一起添加到小计中,然后查看是否添加该小计-总计 totalSize
超出限制。