有没有更好的方法来使用 Python 检索网页大小？

Question

我想要对此 Python 脚本进行健全性检查。我的目标是输入一个 url 的列表并获得一个字节大小，让我知道 url 是好是坏。

import urllib2
import shutil

urls = (LIST OF URLS)

def getUrl(urls):
    for url in urls:
        file_name = url.replace('https://','').replace('.','_').replace('/','_')
        try:
            response = urllib2.urlopen(url)
        except urllib2.HTTPError, e:
            print e.code
        except urllib2URLError, e:
            print e.args
        print urls, len(response.read())
        with open(file_name,'wb') as out_file:
            shutil.copyfileobj(response, out_file)
getUrl(urls)

我遇到的问题是我的输出看起来像：

（URL 列表）22511
（网址列表）56472
（网址列表）8717
...

如何让字节大小只显示一个 url？
有没有更好的方法来获得这些结果？

Answer 1

尝试

print url, len(response.read())

而不是

print urls, len(response.read())

您每次都在打印列表。只打印当前项目。

有一些替代方法可以确定所描述的页面大小 here and here 我在这里复制该信息毫无意义。

编辑

也许您会考虑使用 requests 而不是 urllib2。

您可以轻松地从 HEAD 请求中仅提取 content-length 并避免完整的 GET。例如

import requests

h = requests.head('http://www.google.com')

print h.headers['content-length']

HEAD 请求使用 urllib2 或 httplib2 详细 here。

Answer 2

How would I make only one url appear with the byte size?

显然：不要

print urls, ...

但是

print url, ...

有没有更好的方法来使用 Python 检索网页大小？

Is there a better way to retrieve webpage sizes with Python?

python

urllib

urllib2

python-requests