Python readlines 比 read 快
Python readlines faster than read
这与In Python, is read() , or readlines() faster?相关但不完全相同。我有一个小文件要多次阅读。我发现用 readlines() 读取它并加入比用 read() 读取更快。我找不到很好的解释,但这让我很困惑。
In [34]: cat test.txt
ATOM 1 N MET A 1 -1.112 -18.674 -30.756 1.00 16.53 N
ATOM 2 CA MET A 1 0.327 -18.325 -30.772 1.00 16.53 C
ATOM 3 C MET A 1 0.513 -16.897 -31.160 1.00 16.53 C
ATOM 4 O MET A 1 -0.063 -15.998 -30.552 1.00 16.53 O
ATOM 5 CB MET A 1 1.083 -19.211 -31.777 1.00 16.53 C
ATOM 6 CG MET A 1 1.101 -20.691 -31.391 1.00 16.53 C
ATOM 7 SD MET A 1 1.989 -21.764 -32.559 1.00 16.53 S
ATOM 8 CE MET A 1 3.635 -21.109 -32.159 1.00 16.53 C
ATOM 9 N LYS A 2 1.333 -16.657 -32.199 1.00146.35 N
ATOM 10 CA LYS A 2 1.595 -15.313 -32.613 1.00146.35 C
In [35]: timeit open("test.txt").read()
10000 loops, best of 3: 58.7 µs per loop
In [36]: timeit "\n".join(open("test.txt").readlines())
10000 loops, best of 3: 56.4 µs per loop
结果非常一致。
对于那么小的文件,没有什么区别。
对于更大的文件...
import timeit
data = '''
ATOM 1 N MET A 1 -1.112 -18.674 -30.756 1.00 16.53 N
ATOM 2 CA MET A 1 0.327 -18.325 -30.772 1.00 16.53 C
ATOM 3 C MET A 1 0.513 -16.897 -31.160 1.00 16.53 C
ATOM 4 O MET A 1 -0.063 -15.998 -30.552 1.00 16.53 O
ATOM 5 CB MET A 1 1.083 -19.211 -31.777 1.00 16.53 C
ATOM 6 CG MET A 1 1.101 -20.691 -31.391 1.00 16.53 C
ATOM 7 SD MET A 1 1.989 -21.764 -32.559 1.00 16.53 S
ATOM 8 CE MET A 1 3.635 -21.109 -32.159 1.00 16.53 C
ATOM 9 N LYS A 2 1.333 -16.657 -32.199 1.00146.35 N
ATOM 10 CA LYS A 2 1.595 -15.313 -32.613 1.00146.35 C
'''.lstrip()
names_and_sizes = []
for x in range(1, 10):
reps = 1 + 2 ** (x + 2)
with open('test_{}.txt'.format(x), 'w') as outf:
for x in range(reps):
outf.write(data)
names_and_sizes.append((outf.name, outf.tell()))
for filename, size in names_and_sizes:
a = timeit.timeit(lambda: open(filename).read(), number=1000)
b = timeit.timeit(lambda: "\n".join(open(filename).readlines()), number=1000)
print(filename, size, a, b)
输出是
test_1.txt 7290 0.07285173307172954 0.09389211190864444
test_2.txt 13770 0.08125667599961162 0.1290126950480044
test_3.txt 26730 0.08221574104391038 0.17529957089573145
test_4.txt 52650 0.0865904720267281 0.2977212209952995
test_5.txt 104490 0.1046126070432365 0.5687746809562668
test_6.txt 208170 0.1773586180061102 1.1868972890079021
test_7.txt 415530 0.26339677802752703 2.0290830068988726
test_8.txt 830250 0.31897587003186345 4.381448873900808
test_9.txt 1659690 0.6923789769643918 9.483053435920738
或更直观地
(并且两个轴都是对数的)
这与In Python, is read() , or readlines() faster?相关但不完全相同。我有一个小文件要多次阅读。我发现用 readlines() 读取它并加入比用 read() 读取更快。我找不到很好的解释,但这让我很困惑。
In [34]: cat test.txt
ATOM 1 N MET A 1 -1.112 -18.674 -30.756 1.00 16.53 N
ATOM 2 CA MET A 1 0.327 -18.325 -30.772 1.00 16.53 C
ATOM 3 C MET A 1 0.513 -16.897 -31.160 1.00 16.53 C
ATOM 4 O MET A 1 -0.063 -15.998 -30.552 1.00 16.53 O
ATOM 5 CB MET A 1 1.083 -19.211 -31.777 1.00 16.53 C
ATOM 6 CG MET A 1 1.101 -20.691 -31.391 1.00 16.53 C
ATOM 7 SD MET A 1 1.989 -21.764 -32.559 1.00 16.53 S
ATOM 8 CE MET A 1 3.635 -21.109 -32.159 1.00 16.53 C
ATOM 9 N LYS A 2 1.333 -16.657 -32.199 1.00146.35 N
ATOM 10 CA LYS A 2 1.595 -15.313 -32.613 1.00146.35 C
In [35]: timeit open("test.txt").read()
10000 loops, best of 3: 58.7 µs per loop
In [36]: timeit "\n".join(open("test.txt").readlines())
10000 loops, best of 3: 56.4 µs per loop
结果非常一致。
对于那么小的文件,没有什么区别。
对于更大的文件...
import timeit
data = '''
ATOM 1 N MET A 1 -1.112 -18.674 -30.756 1.00 16.53 N
ATOM 2 CA MET A 1 0.327 -18.325 -30.772 1.00 16.53 C
ATOM 3 C MET A 1 0.513 -16.897 -31.160 1.00 16.53 C
ATOM 4 O MET A 1 -0.063 -15.998 -30.552 1.00 16.53 O
ATOM 5 CB MET A 1 1.083 -19.211 -31.777 1.00 16.53 C
ATOM 6 CG MET A 1 1.101 -20.691 -31.391 1.00 16.53 C
ATOM 7 SD MET A 1 1.989 -21.764 -32.559 1.00 16.53 S
ATOM 8 CE MET A 1 3.635 -21.109 -32.159 1.00 16.53 C
ATOM 9 N LYS A 2 1.333 -16.657 -32.199 1.00146.35 N
ATOM 10 CA LYS A 2 1.595 -15.313 -32.613 1.00146.35 C
'''.lstrip()
names_and_sizes = []
for x in range(1, 10):
reps = 1 + 2 ** (x + 2)
with open('test_{}.txt'.format(x), 'w') as outf:
for x in range(reps):
outf.write(data)
names_and_sizes.append((outf.name, outf.tell()))
for filename, size in names_and_sizes:
a = timeit.timeit(lambda: open(filename).read(), number=1000)
b = timeit.timeit(lambda: "\n".join(open(filename).readlines()), number=1000)
print(filename, size, a, b)
输出是
test_1.txt 7290 0.07285173307172954 0.09389211190864444
test_2.txt 13770 0.08125667599961162 0.1290126950480044
test_3.txt 26730 0.08221574104391038 0.17529957089573145
test_4.txt 52650 0.0865904720267281 0.2977212209952995
test_5.txt 104490 0.1046126070432365 0.5687746809562668
test_6.txt 208170 0.1773586180061102 1.1868972890079021
test_7.txt 415530 0.26339677802752703 2.0290830068988726
test_8.txt 830250 0.31897587003186345 4.381448873900808
test_9.txt 1659690 0.6923789769643918 9.483053435920738
或更直观地
(并且两个轴都是对数的)