urllib2.urlopen 读取的最佳块大小是多少?
whats the best chunk size for a urllib2.urlopen read?
我正在使用这段代码下载 mp3 播客。
req = urllib2.urlopen(item)
CHUNK = 16 * 1024
with open(local_file, 'wb') as fp:
while True:
chunk = req.read(CHUNK)
if not chunk: break
fp.write(chunk)
哪个工作完美 - 但我想知道最佳下载性能的最佳块大小是多少?
如果有区别,我使用的是 6mbit adsl 连接。
一个好的缓冲区大小应与您的 OS 内核用于套接字缓冲区的大小相同。这样,您就不会执行超出应有的读取次数。
在 GNU/Linux 上,可以在 /proc/sys/net/core/rmem_default
文件中看到套接字缓冲区大小(大小以字节为单位)。
您可以增加套接字的缓冲区大小,使用 setsockopt
设置 SO_RCVBUF
参数。但是,此大小受系统限制 (/proc/sys/net/core/rmem_max
),您需要管理员权限 (CAP_NET_ADMIN
) 才能超出该限制。
在这一点上,您可能会做一些特定于平台的事情来获得小的收益。
然而,查看套接字的选项(参见 man 7 socket
、online version)以执行微优化和学习东西是个好主意。 :)
由于没有永远最有效的真正最佳点,您应该始终对所有调整进行基准测试,以检查您的更改是否真的有益。玩得开心!
进一步扩展我对@giant_teapot
的评论
我用来做基准测试的代码是...
#!/usr/bin/env python
import time
import os
import urllib2
#5mb mp3 file
testdl = "http://traffic.libsyn.com/timferriss/Arnold_5_min_-_final.mp3"
chunkmulti = 1
numpass = 5
while (chunkmulti < 207):
passtime = 0
passattempt = 1
while (passattempt <= numpass):
start = time.time()
req = urllib2.urlopen(testdl)
CHUNK = chunkmulti * 1024
with open("test.mp3", 'wb') as fp:
while True:
chunk = req.read(CHUNK)
if not chunk: break
fp.write(chunk)
end = time.time()
passtime += end - start
os.remove("test.mp3")
passattempt += 1
print "Chunk size multiplier ", chunkmulti , " took ", passtime / passattempt, " seconds"
chunkmulti += 1
结果不是决定性的。这是第一组结果...
Chunk size multiplier 1 took 13.9629709721 seconds
Chunk size multiplier 2 took 8.01173728704 seconds
Chunk size multiplier 3 took 10.3750542402 seconds
Chunk size multiplier 4 took 7.11076325178 seconds
Chunk size multiplier 5 took 11.3685477376 seconds
Chunk size multiplier 6 took 6.86864703894 seconds
Chunk size multiplier 7 took 14.2680369616 seconds
Chunk size multiplier 8 took 7.93746650219 seconds
Chunk size multiplier 9 took 6.81188523769 seconds
Chunk size multiplier 10 took 7.54047352076 seconds
Chunk size multiplier 11 took 6.84347498417 seconds
Chunk size multiplier 12 took 7.88792568445 seconds
Chunk size multiplier 13 took 7.37244099379 seconds
Chunk size multiplier 14 took 8.15134423971 seconds
Chunk size multiplier 15 took 7.1664044857 seconds
Chunk size multiplier 16 took 10.9474172592 seconds
Chunk size multiplier 17 took 7.23868894577 seconds
Chunk size multiplier 18 took 7.66610199213 seconds
这样的结果一直持续到 207kb 的块大小
所以我将块大小设置为 6kb。接下来可能会针对 wget 进行基准测试...
我正在使用这段代码下载 mp3 播客。
req = urllib2.urlopen(item)
CHUNK = 16 * 1024
with open(local_file, 'wb') as fp:
while True:
chunk = req.read(CHUNK)
if not chunk: break
fp.write(chunk)
哪个工作完美 - 但我想知道最佳下载性能的最佳块大小是多少?
如果有区别,我使用的是 6mbit adsl 连接。
一个好的缓冲区大小应与您的 OS 内核用于套接字缓冲区的大小相同。这样,您就不会执行超出应有的读取次数。
在 GNU/Linux 上,可以在 /proc/sys/net/core/rmem_default
文件中看到套接字缓冲区大小(大小以字节为单位)。
您可以增加套接字的缓冲区大小,使用 setsockopt
设置 SO_RCVBUF
参数。但是,此大小受系统限制 (/proc/sys/net/core/rmem_max
),您需要管理员权限 (CAP_NET_ADMIN
) 才能超出该限制。
在这一点上,您可能会做一些特定于平台的事情来获得小的收益。
然而,查看套接字的选项(参见 man 7 socket
、online version)以执行微优化和学习东西是个好主意。 :)
由于没有永远最有效的真正最佳点,您应该始终对所有调整进行基准测试,以检查您的更改是否真的有益。玩得开心!
进一步扩展我对@giant_teapot
的评论我用来做基准测试的代码是...
#!/usr/bin/env python
import time
import os
import urllib2
#5mb mp3 file
testdl = "http://traffic.libsyn.com/timferriss/Arnold_5_min_-_final.mp3"
chunkmulti = 1
numpass = 5
while (chunkmulti < 207):
passtime = 0
passattempt = 1
while (passattempt <= numpass):
start = time.time()
req = urllib2.urlopen(testdl)
CHUNK = chunkmulti * 1024
with open("test.mp3", 'wb') as fp:
while True:
chunk = req.read(CHUNK)
if not chunk: break
fp.write(chunk)
end = time.time()
passtime += end - start
os.remove("test.mp3")
passattempt += 1
print "Chunk size multiplier ", chunkmulti , " took ", passtime / passattempt, " seconds"
chunkmulti += 1
结果不是决定性的。这是第一组结果...
Chunk size multiplier 1 took 13.9629709721 seconds
Chunk size multiplier 2 took 8.01173728704 seconds
Chunk size multiplier 3 took 10.3750542402 seconds
Chunk size multiplier 4 took 7.11076325178 seconds
Chunk size multiplier 5 took 11.3685477376 seconds
Chunk size multiplier 6 took 6.86864703894 seconds
Chunk size multiplier 7 took 14.2680369616 seconds
Chunk size multiplier 8 took 7.93746650219 seconds
Chunk size multiplier 9 took 6.81188523769 seconds
Chunk size multiplier 10 took 7.54047352076 seconds
Chunk size multiplier 11 took 6.84347498417 seconds
Chunk size multiplier 12 took 7.88792568445 seconds
Chunk size multiplier 13 took 7.37244099379 seconds
Chunk size multiplier 14 took 8.15134423971 seconds
Chunk size multiplier 15 took 7.1664044857 seconds
Chunk size multiplier 16 took 10.9474172592 seconds
Chunk size multiplier 17 took 7.23868894577 seconds
Chunk size multiplier 18 took 7.66610199213 seconds
这样的结果一直持续到 207kb 的块大小
所以我将块大小设置为 6kb。接下来可能会针对 wget 进行基准测试...