Python Wget：检查是否有重复文件，如果存在则跳过？

Question

所以我正在使用 WGET 下载文件，我想在下载之前检查文件是否存在。我知道 CLI 版本有一个选项：(see example).

# check if file exsists
# if not, download
wget.download(url, path)

使用 WGET，它无需命名即可下载文件。这很重要，因为我不想在文件已有名称时重命名它们。

如果有其他文件下载方法可以检查现有文件，请告诉我！谢谢！！！

Answer 1

我没看到 python 模块有那个选项。

您可以尝试猜测将要使用的文件名（通常是 url 最后一个斜杠字符后的部分）。

或者您可以将文件下载到一个新的临时目录，然后检查该文件名是否存在于您的主目录中。

Answer 2

从 source code 开始，wget.download() 函数似乎没有附加参数选项，例如 -nc 或 -N 用于跳过下载，如果文件已经存在。只有 CLI 版本似乎支持这个。

函数：

def download(url, out=None, bar=bar_adaptive):
    ...

您只能选择url和输出目录

Answer 3

wget.download() 没有任何这样的选项。以下解决方法应该可以解决问题：

import subprocess

url = "https://url/to/index.html"
path = "/path/to/save/your/files"
subprocess.run(["wget", "-r", "-nc", "-P", path, url])

如果文件已经存在，您将收到以下消息：

File ‘index.html’ already there; not retrieving.

编辑： 如果您在 Windows 上运行，您还必须包括 shell=True：

subprocess.run(["wget", "-r", "-nc", "-P", path, url], shell=True)

Python Wget: Check for duplicate files and skip if it exists?