如何使用 Python 下载和提取 tar 文件
How to download and extract a tar file with Python
对于一个非常基本的问题表示歉意,但我已经束手无策了。我不知道如何使用 Python 下载和提取 .xz
tar 文件。请参阅下面的示例代码,其中我尝试了多种方法来从两个不同的 tar 进行下载和提取(只是为了检查我是否没有针对实际格式错误的 tar 进行测试) - 所有这些失败:
系统信息:MacOS10.15.7,Python3.9.9
代码:
#!/usr/bin/env python3
import requests
import tarfile
import subprocess
from functools import partial
from shutil import copyfileobj
example_url_1 = 'https://downloads.raspberrypi.org/raspios_lite_armhf/images/raspios_lite_armhf-2022-04-07/2022-04-04-raspios-bullseye-armhf-lite.img.xz'
example_url_2 = 'https://cdimage.ubuntu.com/releases/22.04/release/ubuntu-22.04-preinstalled-server-arm64+raspi.img.xz'
def _try_extracting(filename: str):
try:
with tarfile.open(filename, mode='r:xz') as tf:
tf.extractall(filename[:-3])
print(f'Extracting {filename} succeeded')
except Exception as e:
print(f'Extracting {filename} failed: ', e)
print('Falling back to extracting via command-line `tar` tool')
p = subprocess.run(['tar', 'xf', filename], capture_output=True)
print(p.stdout)
print(p.stderr)
print()
# https://askubuntu.com/a/843803
print('Also attempting use of command-line `ar` tool')
p = subprocess.run(['ar', '-x', filename], capture_output=True)
print(p.stdout)
print(p.stderr)
print()
print('=========')
def _download_and_extract_in_various_ways(url, prefix):
print('=========================================\n=========================================')
print(f'Operating on {url}')
print('Downloading xz via copyfileobj from requests')
copyfileobj_download_request = requests.get(url, stream=True)
content_length = int(copyfileobj_download_request.headers.get('content-length'))
#
copyfileobj_download_request.raw.read = partial(copyfileobj_download_request.raw.read, decode_content=True)
# Here I would insert `tqdm.wrapattr(copyfileobj_download_request.raw, "read", total=content_length) as tq:`
with open(prefix + '_tar_downloaded_with_copyfileobj.xz', 'wb') as f:
copyfileobj(copyfileobj_download_request.raw, f) # This would be `copyfileobj(tq, f)` if fully wrapped
_try_extracting(prefix + '_tar_downloaded_with_copyfileobj.xz')
print('Downloading xz via bare request')
standard_download_request = requests.get(url)
with open(prefix + '_tar_downloaded_with_requests.xz', 'wb') as f:
f.write(standard_download_request.content)
_try_extracting(prefix + '_tar_downloaded_with_requests.xz')
subprocess.run(['wget', '-O', prefix + '_tar_downloaded_with_wget.xz', url], capture_output=True)
_try_extracting(prefix + '_tar_downloaded_with_wget.xz')
subprocess.run(['curl', '-o', prefix + '_tar_downloaded_with_curl.xz', url], capture_output=True)
_try_extracting(prefix + '_tar_downloaded_with_curl.xz')
def main():
_download_and_extract_in_various_ways(example_url_1, 'raspbian')
_download_and_extract_in_various_ways(example_url_2, 'ubuntu')
if __name__ == '__main__':
main()
输出:
=========================================
=========================================
Operating on https://downloads.raspberrypi.org/raspios_lite_armhf/images/raspios_lite_armhf-2022-04-07/2022-04-04-raspios-bullseye-armhf-lite.img.xz
Downloading xz via copyfileobj from requests
Extracting raspbian_tar_downloaded_with_copyfileobj.xz failed: bad checksum
Falling back to extracting via command-line `tar` tool
b''
b'tar: Error opening archive: Unrecognized archive format\n'
Also attempting use of command-line `ar` tool
b''
b'ar: raspbian_tar_downloaded_with_copyfileobj.xz: Inappropriate file type or format\n'
=========
Downloading xz via bare request
Extracting raspbian_tar_downloaded_with_requests.xz failed: bad checksum
Falling back to extracting via command-line `tar` tool
b''
b'tar: Error opening archive: Unrecognized archive format\n'
Also attempting use of command-line `ar` tool
b''
b'ar: raspbian_tar_downloaded_with_requests.xz: Inappropriate file type or format\n'
=========
Extracting raspbian_tar_downloaded_with_wget.xz failed: bad checksum
Falling back to extracting via command-line `tar` tool
b''
b'tar: Error opening archive: Unrecognized archive format\n'
Also attempting use of command-line `ar` tool
b''
b'ar: raspbian_tar_downloaded_with_wget.xz: Inappropriate file type or format\n'
=========
Extracting raspbian_tar_downloaded_with_curl.xz failed: bad checksum
Falling back to extracting via command-line `tar` tool
b''
b'tar: Error opening archive: Unrecognized archive format\n'
Also attempting use of command-line `ar` tool
b''
b'ar: raspbian_tar_downloaded_with_curl.xz: Inappropriate file type or format\n'
=========
=========================================
=========================================
Operating on https://cdimage.ubuntu.com/releases/22.04/release/ubuntu-22.04-preinstalled-server-arm64+raspi.img.xz
Downloading xz via copyfileobj from requests
Extracting ubuntu_tar_downloaded_with_copyfileobj.xz failed: bad checksum
Falling back to extracting via command-line `tar` tool
b''
b'tar: Error opening archive: Unrecognized archive format\n'
Also attempting use of command-line `ar` tool
b''
b'ar: ubuntu_tar_downloaded_with_copyfileobj.xz: Inappropriate file type or format\n'
=========
Downloading xz via bare request
Extracting ubuntu_tar_downloaded_with_requests.xz failed: bad checksum
Falling back to extracting via command-line `tar` tool
b''
b'tar: Error opening archive: Unrecognized archive format\n'
Also attempting use of command-line `ar` tool
b''
b'ar: ubuntu_tar_downloaded_with_requests.xz: Inappropriate file type or format\n'
=========
Extracting ubuntu_tar_downloaded_with_wget.xz failed: bad checksum
Falling back to extracting via command-line `tar` tool
b''
b'tar: Error opening archive: Unrecognized archive format\n'
Also attempting use of command-line `ar` tool
b''
b'ar: ubuntu_tar_downloaded_with_wget.xz: Inappropriate file type or format\n'
=========
Extracting ubuntu_tar_downloaded_with_curl.xz failed: bad checksum
Falling back to extracting via command-line `tar` tool
b''
b'tar: Error opening archive: Unrecognized archive format\n'
Also attempting use of command-line `ar` tool
b''
b'ar: ubuntu_tar_downloaded_with_curl.xz: Inappropriate file type or format\n'
=========
如果我通过在浏览器中直接访问这些 url 下载这些文件,生成的下载文件 仍然 无法用 tar xf <file>
提取。
有趣的是,通过脚本 ([copyfileobj,requests,wget,curl] X [raspbian,ubuntu]) 下载的所有 8 个文件都可以通过“打开”正确提取在 Mac OS Finder 中搜索它们。
对于一个非常基本的问题表示歉意,但我已经束手无策了。我不知道如何使用 Python 下载和提取 .xz
tar 文件。请参阅下面的示例代码,其中我尝试了多种方法来从两个不同的 tar 进行下载和提取(只是为了检查我是否没有针对实际格式错误的 tar 进行测试) - 所有这些失败:
系统信息:MacOS10.15.7,Python3.9.9
代码:
#!/usr/bin/env python3
import requests
import tarfile
import subprocess
from functools import partial
from shutil import copyfileobj
example_url_1 = 'https://downloads.raspberrypi.org/raspios_lite_armhf/images/raspios_lite_armhf-2022-04-07/2022-04-04-raspios-bullseye-armhf-lite.img.xz'
example_url_2 = 'https://cdimage.ubuntu.com/releases/22.04/release/ubuntu-22.04-preinstalled-server-arm64+raspi.img.xz'
def _try_extracting(filename: str):
try:
with tarfile.open(filename, mode='r:xz') as tf:
tf.extractall(filename[:-3])
print(f'Extracting {filename} succeeded')
except Exception as e:
print(f'Extracting {filename} failed: ', e)
print('Falling back to extracting via command-line `tar` tool')
p = subprocess.run(['tar', 'xf', filename], capture_output=True)
print(p.stdout)
print(p.stderr)
print()
# https://askubuntu.com/a/843803
print('Also attempting use of command-line `ar` tool')
p = subprocess.run(['ar', '-x', filename], capture_output=True)
print(p.stdout)
print(p.stderr)
print()
print('=========')
def _download_and_extract_in_various_ways(url, prefix):
print('=========================================\n=========================================')
print(f'Operating on {url}')
print('Downloading xz via copyfileobj from requests')
copyfileobj_download_request = requests.get(url, stream=True)
content_length = int(copyfileobj_download_request.headers.get('content-length'))
#
copyfileobj_download_request.raw.read = partial(copyfileobj_download_request.raw.read, decode_content=True)
# Here I would insert `tqdm.wrapattr(copyfileobj_download_request.raw, "read", total=content_length) as tq:`
with open(prefix + '_tar_downloaded_with_copyfileobj.xz', 'wb') as f:
copyfileobj(copyfileobj_download_request.raw, f) # This would be `copyfileobj(tq, f)` if fully wrapped
_try_extracting(prefix + '_tar_downloaded_with_copyfileobj.xz')
print('Downloading xz via bare request')
standard_download_request = requests.get(url)
with open(prefix + '_tar_downloaded_with_requests.xz', 'wb') as f:
f.write(standard_download_request.content)
_try_extracting(prefix + '_tar_downloaded_with_requests.xz')
subprocess.run(['wget', '-O', prefix + '_tar_downloaded_with_wget.xz', url], capture_output=True)
_try_extracting(prefix + '_tar_downloaded_with_wget.xz')
subprocess.run(['curl', '-o', prefix + '_tar_downloaded_with_curl.xz', url], capture_output=True)
_try_extracting(prefix + '_tar_downloaded_with_curl.xz')
def main():
_download_and_extract_in_various_ways(example_url_1, 'raspbian')
_download_and_extract_in_various_ways(example_url_2, 'ubuntu')
if __name__ == '__main__':
main()
输出:
=========================================
=========================================
Operating on https://downloads.raspberrypi.org/raspios_lite_armhf/images/raspios_lite_armhf-2022-04-07/2022-04-04-raspios-bullseye-armhf-lite.img.xz
Downloading xz via copyfileobj from requests
Extracting raspbian_tar_downloaded_with_copyfileobj.xz failed: bad checksum
Falling back to extracting via command-line `tar` tool
b''
b'tar: Error opening archive: Unrecognized archive format\n'
Also attempting use of command-line `ar` tool
b''
b'ar: raspbian_tar_downloaded_with_copyfileobj.xz: Inappropriate file type or format\n'
=========
Downloading xz via bare request
Extracting raspbian_tar_downloaded_with_requests.xz failed: bad checksum
Falling back to extracting via command-line `tar` tool
b''
b'tar: Error opening archive: Unrecognized archive format\n'
Also attempting use of command-line `ar` tool
b''
b'ar: raspbian_tar_downloaded_with_requests.xz: Inappropriate file type or format\n'
=========
Extracting raspbian_tar_downloaded_with_wget.xz failed: bad checksum
Falling back to extracting via command-line `tar` tool
b''
b'tar: Error opening archive: Unrecognized archive format\n'
Also attempting use of command-line `ar` tool
b''
b'ar: raspbian_tar_downloaded_with_wget.xz: Inappropriate file type or format\n'
=========
Extracting raspbian_tar_downloaded_with_curl.xz failed: bad checksum
Falling back to extracting via command-line `tar` tool
b''
b'tar: Error opening archive: Unrecognized archive format\n'
Also attempting use of command-line `ar` tool
b''
b'ar: raspbian_tar_downloaded_with_curl.xz: Inappropriate file type or format\n'
=========
=========================================
=========================================
Operating on https://cdimage.ubuntu.com/releases/22.04/release/ubuntu-22.04-preinstalled-server-arm64+raspi.img.xz
Downloading xz via copyfileobj from requests
Extracting ubuntu_tar_downloaded_with_copyfileobj.xz failed: bad checksum
Falling back to extracting via command-line `tar` tool
b''
b'tar: Error opening archive: Unrecognized archive format\n'
Also attempting use of command-line `ar` tool
b''
b'ar: ubuntu_tar_downloaded_with_copyfileobj.xz: Inappropriate file type or format\n'
=========
Downloading xz via bare request
Extracting ubuntu_tar_downloaded_with_requests.xz failed: bad checksum
Falling back to extracting via command-line `tar` tool
b''
b'tar: Error opening archive: Unrecognized archive format\n'
Also attempting use of command-line `ar` tool
b''
b'ar: ubuntu_tar_downloaded_with_requests.xz: Inappropriate file type or format\n'
=========
Extracting ubuntu_tar_downloaded_with_wget.xz failed: bad checksum
Falling back to extracting via command-line `tar` tool
b''
b'tar: Error opening archive: Unrecognized archive format\n'
Also attempting use of command-line `ar` tool
b''
b'ar: ubuntu_tar_downloaded_with_wget.xz: Inappropriate file type or format\n'
=========
Extracting ubuntu_tar_downloaded_with_curl.xz failed: bad checksum
Falling back to extracting via command-line `tar` tool
b''
b'tar: Error opening archive: Unrecognized archive format\n'
Also attempting use of command-line `ar` tool
b''
b'ar: ubuntu_tar_downloaded_with_curl.xz: Inappropriate file type or format\n'
=========
如果我通过在浏览器中直接访问这些 url 下载这些文件,生成的下载文件 仍然 无法用 tar xf <file>
提取。
有趣的是,通过脚本 ([copyfileobj,requests,wget,curl] X [raspbian,ubuntu]) 下载的所有 8 个文件都可以通过“打开”正确提取在 Mac OS Finder 中搜索它们。