如何从 URL 下载文件到磁盘并猜测文件名

Question

我正在寻找从 URL 下载文件、保存到磁盘并从 URL 或 headers 找出文件名的正确方法。

解决方案可以在 Python、Node、Ruby 或 PHP 中 - 只要其中一个选项对我来说并不重要。

通过猜测 URL 中的文件名来实现简单的实现很容易，但我需要它才能工作，即使存在重定向并且文件名不在 URL 中也是如此。

这里有一些示例 URLs 和我期望的文件名：

URL 示例中的文件名

URL: http://static.guim.co.uk/sys-images/Guardian/Pix/pictures/2010/4/14/1271276213693/Snoop-Dogg-in-2004-001.jpg
下载后应保存文件名：Snoop-Dogg-in-2004-001.jpg

文件名 + URL 示例中的查询参数

URL: http://i.imgur.com/mW7vW4j.gif?go=true
下载应保存为文件名：mW7vW4j.gif

重定向 - Header 示例中的文件名

URL: https://api.soundcloud.com/tracks/183721111/download?client_id=b45b1aa10f1ac2941910a7f0d10f8e28
下载后应保存文件名：I Might ft. P-Lo & K Camp.mp3

还有 - 这里是关于重定向案例的更多信息：Ruby - how to download a file if the url is a redirection?

Answer 1

使用 Python requests 模块。

import requests, os

url = "http://static.guim.co.uk/sys-images/Guardian/Pix/pictures/2010/4/14/1271276213693/Snoop-Dogg-in-2004-001.jpg"
resp = requests.get(url, stream=True, allow_redirects=True)
realurl = resp.url.split('/')[-1].split('?')[0]

savepath = '' # set the folder to save to
filepath = os.path.join(savepath, realurl)

with open(filepath, 'wb') as image:
    if resp.ok:
        for content in resp.iter_content(1024):
            if content:
                image.write(content)

Answer 2

Ruby，使用Mechanize gem，简单情况：

require 'mechanize'
agent = Mechanize.new
agent.get(url).save

这甚至会跟随重定向并使用正确的文件名保存。它将第二个示例中的 http 查询字符串转换为有效的文件名。如果你想删除任何查询字符串（警告：这可能是识别唯一资源所必需的），你可能必须像这样调整它：

require 'mechanize'
agent = Mechanize.new    
uri = URI.parse(url)    
if uri.query.nil?
  agent.get(url).save
else
  agent.get(url).save_as(File.basename(uri.path))
end

如何从 URL 下载文件到磁盘并猜测文件名

How to download a file from URL to disk and guess filename

php

ruby

python

url

node.js