使用通配符在 Python 脚本中解压文件
Untar file in Python script with wildcard
我正在尝试在 Python 脚本中从 HDFS 导入 tar.gz 文件,然后解压它。该文件如下 20160822073413-EoRcGvXMDIB5SVenEyD4pOEADPVPhPsg.tar.gz,它始终具有相同的结构。
在我的 python 脚本中,我想将其复制到本地并提取文件。我正在使用以下命令来执行此操作:
import subprocess
import os
import datetime
import time
today = time.strftime("%Y%m%d")
#Copy tar file from HDFS to local server
args = ["hadoop","fs","-copyToLocal", "/locationfile/" + today + "*"]
p=subprocess.Popen(args)
p.wait()
#Untar the CSV file
args = ["tar","-xzvf",today + "*"]
p=subprocess.Popen(args)
p.wait()
导入工作完美,但我无法提取文件,出现以下错误:
['tar', '-xzvf', '20160822*.tar']
tar (child): 20160822*.tar: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
put: `reportResults.csv': No such file or directory
谁能帮帮我?
非常感谢!
尝试使用 shell
选项:
p=subprocess.Popen(args, shell=True)
来自the docs:
If shell is True, the specified command will be executed through the
shell. This can be useful if you are using Python primarily for the
enhanced control flow it offers over most system shells and still want
convenient access to other shell features such as shell pipes,
filename wildcards, environment variable expansion, and expansion of ~
to a user’s home directory.
并注意:
However, note that Python itself offers implementations of many
shell-like features (in particular, glob, fnmatch, os.walk(),
os.path.expandvars(), os.path.expanduser(), and shutil).
除了@martriay 的回答,你还有一个错字——你写了“20160822*.tar”,而你的文件模式是“20160822*.tar.gz”
当应用shell=True
时,命令应该作为一个完整的字符串传递(见documentation),像这样:
p=subprocess.Popen('tar -xzvf 20160822*.tar.gz', shell=True)
如果你不需要p
,你可以简单地使用subprocess.call:
subprocess.call('tar -xzvf 20160822*.tar.gz', shell=True)
但是我建议你使用更多的标准库,像这样:
import glob
import tarfile
today = "20160822" # compute your common prefix here
target_dir = "/tmp" # choose where ever you want to extract the content
for targz_file in glob.glob('%s*.tar.gz' % today):
with tarfile.open(targz_file, 'r:gz') as opened_targz_file:
opened_targz_file.extractall(target_dir)
我找到了一种方法来做我需要的,而不是使用 os 命令,我使用了 python tar 命令并且它有效!
import tarfile
import glob
os.chdir("/folder_to_scan/")
for file in glob.glob("*.tar.gz"):
print(file)
tar = tarfile.open(file)
tar.extractall()
希望对您有所帮助。
问候
马吉德
我正在尝试在 Python 脚本中从 HDFS 导入 tar.gz 文件,然后解压它。该文件如下 20160822073413-EoRcGvXMDIB5SVenEyD4pOEADPVPhPsg.tar.gz,它始终具有相同的结构。
在我的 python 脚本中,我想将其复制到本地并提取文件。我正在使用以下命令来执行此操作:
import subprocess
import os
import datetime
import time
today = time.strftime("%Y%m%d")
#Copy tar file from HDFS to local server
args = ["hadoop","fs","-copyToLocal", "/locationfile/" + today + "*"]
p=subprocess.Popen(args)
p.wait()
#Untar the CSV file
args = ["tar","-xzvf",today + "*"]
p=subprocess.Popen(args)
p.wait()
导入工作完美,但我无法提取文件,出现以下错误:
['tar', '-xzvf', '20160822*.tar']
tar (child): 20160822*.tar: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
put: `reportResults.csv': No such file or directory
谁能帮帮我?
非常感谢!
尝试使用 shell
选项:
p=subprocess.Popen(args, shell=True)
来自the docs:
If shell is True, the specified command will be executed through the shell. This can be useful if you are using Python primarily for the enhanced control flow it offers over most system shells and still want convenient access to other shell features such as shell pipes, filename wildcards, environment variable expansion, and expansion of ~ to a user’s home directory.
并注意:
However, note that Python itself offers implementations of many shell-like features (in particular, glob, fnmatch, os.walk(), os.path.expandvars(), os.path.expanduser(), and shutil).
除了@martriay 的回答,你还有一个错字——你写了“20160822*.tar”,而你的文件模式是“20160822*.tar.gz”
当应用shell=True
时,命令应该作为一个完整的字符串传递(见documentation),像这样:
p=subprocess.Popen('tar -xzvf 20160822*.tar.gz', shell=True)
如果你不需要p
,你可以简单地使用subprocess.call:
subprocess.call('tar -xzvf 20160822*.tar.gz', shell=True)
但是我建议你使用更多的标准库,像这样:
import glob
import tarfile
today = "20160822" # compute your common prefix here
target_dir = "/tmp" # choose where ever you want to extract the content
for targz_file in glob.glob('%s*.tar.gz' % today):
with tarfile.open(targz_file, 'r:gz') as opened_targz_file:
opened_targz_file.extractall(target_dir)
我找到了一种方法来做我需要的,而不是使用 os 命令,我使用了 python tar 命令并且它有效!
import tarfile
import glob
os.chdir("/folder_to_scan/")
for file in glob.glob("*.tar.gz"):
print(file)
tar = tarfile.open(file)
tar.extractall()
希望对您有所帮助。
问候 马吉德