LMDB 随机写入大数据真的很慢 (~1MB/s)
LMDB random writes really slow for large data (~1MB/s)
我正在使用 http://www.lmdb.tech/bench/ondisk/ 中给出的基准测试 LMDB 性能。而且我注意到当数据超出内存时,LMDB 随机写入真的很慢。
我正在使用配备 4GB DRAM 和 Intel PCIE SSD 的机器。键大小为 10 个字节,值大小为 1KB。基准代码在http://www.lmdb.tech/bench/ondisk/中给出,我使用的命令行是“./db_bench_mdb --benchmarks=fillrandbatch --threads=1 --stats_interval=1024 --num=10000000 --value_size=1000 --use_existing_db=0 ".
对于写入的前1GB数据,平均写入速率为140MB/s。然后,前 2GB 的速率显着下降到 40MB/s。测试结束,写入10M的值,平均速率仅为3MB/s,即时速率为1MB/s。我知道 LMDB 没有针对写入进行优化,但我没想到它会这么慢,因为我有一个非常高端的 Intel SSD。
我也注意到LMDB访问SSD的方式真的很奇怪。在测试开始时,它以大约 400MB/s 的速度写入 SSD,但不执行读取,这是预期的。但是随着我们写入的数据越来越多,LMDB开始读取SSD。随着时间的推移,读取吞吐量上升,而写入吞吐量显着下降。在测试结束时,LMDB 持续以大约 190MB/s 的速度读取,同时偶尔以大约 10-20 秒的间隔发出 100MB 的写入。
1. LMDB对SSD上存储的数据有如此低的写入吞吐量(测试结束时为1MB/s)是否正常?
2。为什么 LMDB 在测试结束时读取的数据多于写入的数据(每写入 1MB 大约读取 20MB 数据)?
据我了解,虽然我们的数据多于DRAM的容量,但B树的分支节点应该还在DRAM中。因此,对于每次写入,我们需要从 SSD 获取的唯一页面是叶节点。而当我们写叶节点的时候,我们可能还需要写它的父节点。所以写入应该多于读取。但事实证明,LMDB 的阅读远多于写作。我认为这可能是它最后如此缓慢的原因。但是我真的不明白为什么。
供参考,这里是基准测试给出的部分日志:
2018/03/12-10:36:30 ... thread 0: (1024,1024) ops and (54584.2,54584.2) ops/second in (0.018760,0.018760) seconds
2018/03/12-10:36:30 ... thread 0: (1024,2048) ops and (111231.8,73231.8) ops/second in (0.009206,0.027966) seconds
2018/03/12-10:36:30 ... thread 0: (1024,3072) ops and (125382.6,85019.2) ops/second in (0.008167,0.036133) seconds
2018/03/12-10:36:30 ... thread 0: (1024,4096) ops and (206202.2,99661.8) ops/second in (0.004966,0.041099) seconds
2018/03/12-10:36:30 ... thread 0: (1024,5120) ops and (259634.9,113669.2) ops/second in (0.003944,0.045043) seconds
2018/03/12-10:36:30 ... thread 0: (1024,6144) ops and (306495.1,126984.1) ops/second in (0.003341,0.048384) seconds
2018/03/12-10:36:30 ... thread 0: (1024,7168) ops and (339185.2,139447.1) ops/second in (0.003019,0.051403) seconds
2018/03/12-10:36:30 ... thread 0: (1024,8192) ops and (384240.2,151512.9) ops/second in (0.002665,0.054068) seconds
2018/03/12-10:36:30 ... thread 0: (1024,9216) ops and (385252.1,162465.2) ops/second in (0.002658,0.056726) seconds
2018/03/12-10:36:30 ... thread 0: (1024,10240) ops and (371553.0,172152.9) ops/second in (0.002756,0.059482) seconds
...
2018/03/12-10:36:37 ... thread 0: (1024,993280) ops and (70127.4,142518.0) ops/second in (0.014602,6.969505) seconds
2018/03/12-10:36:37 ... thread 0: (1024,994304) ops and (199415.8,142559.9) ops/second in (0.005135,6.974640) seconds
2018/03/12-10:36:37 ... thread 0: (1024,995328) ops and (75953.1,142431.4) ops/second in (0.013482,6.988122) seconds
2018/03/12-10:36:37 ... thread 0: (1024,996352) ops and (200823.7,142474.0) ops/second in (0.005099,6.993221) seconds
2018/03/12-10:36:37 ... thread 0: (1024,997376) ops and (71975.8,142330.8) ops/second in (0.014227,7.007448) seconds
2018/03/12-10:36:37 ... thread 0: (1024,998400) ops and (62117.1,142142.6) ops/second in (0.016485,7.023933) seconds
2018/03/12-10:36:37 ... thread 0: (1024,999424) ops and (36366.2,141720.2) ops/second in (0.028158,7.052091) seconds
2018/03/12-10:36:37 ... thread 0: (1024,1000448) ops and (61914.3,141533.5) ops/second in (0.016539,7.068630) seconds
2018/03/12-10:36:37 ... thread 0: (1024,1001472) ops and (60985.1,141342.6) ops/second in (0.016791,7.085421) seconds
2018/03/12-10:36:37 ... thread 0: (1024,1002496) ops and (60466.5,141149.8) ops/second in (0.016935,7.102356) seconds
2018/03/12-10:36:37 ... thread 0: (1024,1003520) ops and (60189.3,140956.3) ops/second in (0.017013,7.119369) seconds
2018/03/12-10:36:37 ... thread 0: (1024,1004544) ops and (61731.4,140772.1) ops/second in (0.016588,7.135957) seconds
...
2018/03/12-10:40:15 ... thread 0: (1024,3236864) ops and (5620.5,14373.0) ops/second in (0.182189,225.203790) seconds
2018/03/12-10:40:15 ... thread 0: (1024,3237888) ops and (6098.5,14366.9) ops/second in (0.167911,225.371701) seconds
2018/03/12-10:40:15 ... thread 0: (1024,3238912) ops and (5469.5,14359.5) ops/second in (0.187221,225.558922) seconds
2018/03/12-10:40:15 ... thread 0: (1024,3239936) ops and (5593.9,14352.4) ops/second in (0.183056,225.741978) seconds
2018/03/12-10:40:16 ... thread 0: (1024,3240960) ops and (5806.9,14345.7) ops/second in (0.176342,225.918320) seconds
2018/03/12-10:40:16 ... thread 0: (1024,3241984) ops and (5332.9,14338.1) ops/second in (0.192016,226.110336) seconds
2018/03/12-10:40:16 ... thread 0: (1024,3243008) ops and (5532.3,14330.9) ops/second in (0.185096,226.295432) seconds
2018/03/12-10:40:16 ... thread 0: (1024,3244032) ops and (6108.8,14324.8) ops/second in (0.167626,226.463058) seconds
2018/03/12-10:40:16 ... thread 0: (1024,3245056) ops and (6074.7,14318.6) ops/second in (0.168567,226.631625) seconds
2018/03/12-10:40:17 ... thread 0: (1024,3246080) ops and (5615.2,14311.6) ops/second in (0.182362,226.813987) seconds
2018/03/12-10:40:17 ... thread 0: (1024,3247104) ops and (5529.3,14304.5) ops/second in (0.185194,226.999181) seconds
2018/03/12-10:40:17 ... thread 0: (1024,3248128) ops and (5846.2,14298.0) ops/second in (0.175156,227.174337) seconds
2018/03/12-10:40:17 ... thread 0: (1024,3249152) ops and (5741.5,14291.2) ops/second in (0.178351,227.352688) seconds
2018/03/12-10:40:17 ... thread 0: (1024,3250176) ops and (5640.2,14284.3) ops/second in (0.181555,227.534243) seconds
...
2018/03/12-11:30:39 ... thread 0: (1024,9988096) ops and (1917.2,3074.3) ops/second in (0.534112,3248.860552) seconds
2018/03/12-11:30:39 ... thread 0: (1024,9989120) ops and (1858.9,3074.1) ops/second in (0.550851,3249.411403) seconds
2018/03/12-11:30:40 ... thread 0: (1024,9990144) ops and (1922.8,3073.9) ops/second in (0.532557,3249.943960) seconds
2018/03/12-11:30:40 ... thread 0: (1024,9991168) ops and (1857.2,3073.7) ops/second in (0.551382,3250.495342) seconds
2018/03/12-11:30:41 ... thread 0: (1024,9992192) ops and (1851.3,3073.5) ops/second in (0.553130,3251.048472) seconds
2018/03/12-11:30:41 ... thread 0: (1024,9993216) ops and (1941.0,3073.3) ops/second in (0.527568,3251.576040) seconds
2018/03/12-11:30:42 ... thread 0: (1024,9994240) ops and (1923.1,3073.2) ops/second in (0.532461,3252.108501) seconds
2018/03/12-11:30:42 ... thread 0: (1024,9995264) ops and (1987.6,3073.0) ops/second in (0.515200,3252.623701) seconds
2018/03/12-11:30:43 ... thread 0: (1024,9996288) ops and (1931.2,3072.8) ops/second in (0.530233,3253.153934) seconds
2018/03/12-11:30:43 ... thread 0: (1024,9997312) ops and (1918.9,3072.6) ops/second in (0.533633,3253.687567) seconds
2018/03/12-11:30:44 ... thread 0: (1024,9998336) ops and (1999.0,3072.4) ops/second in (0.512246,3254.199813) seconds
2018/03/12-11:30:44 ... thread 0: (1024,9999360) ops and (1853.3,3072.2) ops/second in (0.552533,3254.752346) seconds
fillrandbatch : 325.508 micros/op 3072 ops/sec; 3.0 MB/s
这是从 iostat 以 1 秒的间隔转储的 read/write 速率:
Device tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sdb 73.00 0.12 25.52 0 25
sdb 531.00 0.00 495.21 0 495
sdb 15089.00 0.00 488.77 0 488
sdb 27431.00 0.01 463.55 0 463
sdb 13093.00 0.00 478.77 0 478
sdb 53676.00 0.00 413.79 0 413
sdb 16781.00 0.00 483.60 0 483
sdb 22267.00 0.00 323.32 0 323
sdb 23945.00 0.00 164.55 0 164
sdb 22867.00 0.00 152.25 0 152
sdb 22038.00 0.00 146.39 0 146
sdb 23825.00 0.00 263.61 0 263
...
sdb 20866.00 85.81 76.90 85 76
sdb 7684.00 101.75 115.19 101 115
sdb 3707.00 154.48 0.00 154 0
sdb 4349.00 181.41 0.00 181 0
sdb 4373.00 184.70 0.00 184 0
sdb 4329.00 185.04 0.00 185 0
sdb 4338.00 182.30 0.01 182 0
sdb 4364.00 184.27 0.00 184 0
sdb 5310.00 177.32 4.99 177 4
sdb 32130.00 99.07 119.70 99 119
sdb 27010.00 103.26 99.25 103 99
sdb 11109.00 67.18 99.96 67 99
sdb 3931.00 172.51 0.00 172 0
sdb 4112.00 171.28 0.00 171 0
sdb 4202.00 183.03 0.00 183 0
sdb 4119.00 183.79 0.00 183 0
sdb 4232.00 182.77 0.02 182 0
sdb 4224.00 185.90 0.00 185 0
sdb 4304.00 186.17 0.00 186 0
sdb 4279.00 188.83 0.00 188 0
sdb 4087.00 184.38 0.00 184 0
sdb 7758.00 163.86 16.70 163 16
sdb 21309.00 68.95 80.11 68 80
sdb 21166.00 81.66 78.42 81 78
sdb 19328.00 71.56 71.55 71 71
sdb 20836.00 89.08 76.52 89 76
sdb 3211.00 112.01 82.21 112 82
sdb 3939.00 173.40 0.00 173 0
sdb 3992.00 178.03 0.00 178 0
sdb 4251.00 181.49 0.00 181 0
sdb 4148.00 185.63 0.00 185 0
sdb 4094.00 184.12 0.01 184 0
sdb 4241.00 187.38 0.00 187 0
sdb 4044.00 186.60 0.00 186 0
sdb 4049.00 185.47 0.00 185 0
sdb 4247.00 189.17 0.00 189 0
...
sdb 17457.00 105.45 64.05 105 64
sdb 16736.00 82.12 62.35 82 62
sdb 12074.00 108.76 66.21 108 66
sdb 2232.00 194.44 0.00 194 0
sdb 2171.00 187.27 0.02 187 0
sdb 2322.00 197.91 0.00 197 0
sdb 2311.00 194.65 0.00 194 0
sdb 2240.00 187.93 0.00 187 0
sdb 2189.00 191.38 0.00 191 0
sdb 2266.00 192.33 0.01 192 0
sdb 2312.00 198.95 0.00 198 0
sdb 2310.00 199.84 0.00 199 0
sdb 2350.00 198.83 0.00 198 0
sdb 2275.00 198.31 0.00 198 0
sdb 3952.00 185.05 6.79 185 6
sdb 15842.00 59.89 59.67 59 59
sdb 16676.00 88.24 61.79 88 61
sdb 14768.00 75.94 55.00 75 54
sdb 5677.00 141.71 35.03 141 35
sdb 2135.00 184.78 0.04 184 0
sdb 2301.00 197.18 0.00 197 0
sdb 2334.00 198.81 0.00 198 0
sdb 2304.00 198.83 0.00 198 0
sdb 2348.00 198.67 0.00 198 0
sdb 2352.00 198.42 0.01 198 0
sdb 2373.00 199.32 0.00 199 0
sdb 2363.00 197.55 0.00 197 0
sdb 2289.00 198.71 0.00 198 0
sdb 2246.00 189.31 0.00 189 0
sdb 2357.00 198.64 0.01 198 0
sdb 2338.00 197.96 0.00 197 0
sdb 6292.00 177.60 16.56 177 16
sdb 19374.00 93.72 72.16 93 72
sdb 16873.00 101.38 62.01 101 62
sdb 16960.00 98.99 76.84 98 76
sdb 2299.00 189.32 6.16 189 6
sdb 2285.00 195.82 0.00 195 0
sdb 2346.00 198.25 0.00 198 0
sdb 2325.00 198.91 0.00 198 0
sdb 2353.00 197.72 0.02 197 0
sdb 2320.00 198.82 0.00 198 0
sdb 2327.00 200.05 0.00 200 0
sdb 2340.00 198.35 0.00 198 0
sdb 2322.00 199.29 0.00 199 0
sdb 2316.00 197.43 0.01 197 0
sdb 690.00 51.17 0.00 51 0
在@Howard Chu 的帮助下,我找到了问题所在。
读取过多是由 OS read-ahead 引起的。默认情况下,当我们读取一个页面时,OS 会将该页面之后的页面预取到内存中。然而,这在数据库中通常是不可取的,因为我们通常进行随机读取。所以我们需要禁用 read-ahead 以获得随机读取的最佳性能。
对于基准测试,有一个命令行选项 --readahead=0,它反过来为 LMDB 设置 MDB_NORDAHEAD 选项。
禁用read-ahead后,即时写入速率在测试结束时从 1MB/s 增加到 8MB/s。并且在 iostat 中观察到的 read/write 数量几乎相同。
我正在使用 http://www.lmdb.tech/bench/ondisk/ 中给出的基准测试 LMDB 性能。而且我注意到当数据超出内存时,LMDB 随机写入真的很慢。
我正在使用配备 4GB DRAM 和 Intel PCIE SSD 的机器。键大小为 10 个字节,值大小为 1KB。基准代码在http://www.lmdb.tech/bench/ondisk/中给出,我使用的命令行是“./db_bench_mdb --benchmarks=fillrandbatch --threads=1 --stats_interval=1024 --num=10000000 --value_size=1000 --use_existing_db=0 ".
对于写入的前1GB数据,平均写入速率为140MB/s。然后,前 2GB 的速率显着下降到 40MB/s。测试结束,写入10M的值,平均速率仅为3MB/s,即时速率为1MB/s。我知道 LMDB 没有针对写入进行优化,但我没想到它会这么慢,因为我有一个非常高端的 Intel SSD。
我也注意到LMDB访问SSD的方式真的很奇怪。在测试开始时,它以大约 400MB/s 的速度写入 SSD,但不执行读取,这是预期的。但是随着我们写入的数据越来越多,LMDB开始读取SSD。随着时间的推移,读取吞吐量上升,而写入吞吐量显着下降。在测试结束时,LMDB 持续以大约 190MB/s 的速度读取,同时偶尔以大约 10-20 秒的间隔发出 100MB 的写入。
1. LMDB对SSD上存储的数据有如此低的写入吞吐量(测试结束时为1MB/s)是否正常?
2。为什么 LMDB 在测试结束时读取的数据多于写入的数据(每写入 1MB 大约读取 20MB 数据)?
据我了解,虽然我们的数据多于DRAM的容量,但B树的分支节点应该还在DRAM中。因此,对于每次写入,我们需要从 SSD 获取的唯一页面是叶节点。而当我们写叶节点的时候,我们可能还需要写它的父节点。所以写入应该多于读取。但事实证明,LMDB 的阅读远多于写作。我认为这可能是它最后如此缓慢的原因。但是我真的不明白为什么。
供参考,这里是基准测试给出的部分日志:
2018/03/12-10:36:30 ... thread 0: (1024,1024) ops and (54584.2,54584.2) ops/second in (0.018760,0.018760) seconds
2018/03/12-10:36:30 ... thread 0: (1024,2048) ops and (111231.8,73231.8) ops/second in (0.009206,0.027966) seconds
2018/03/12-10:36:30 ... thread 0: (1024,3072) ops and (125382.6,85019.2) ops/second in (0.008167,0.036133) seconds
2018/03/12-10:36:30 ... thread 0: (1024,4096) ops and (206202.2,99661.8) ops/second in (0.004966,0.041099) seconds
2018/03/12-10:36:30 ... thread 0: (1024,5120) ops and (259634.9,113669.2) ops/second in (0.003944,0.045043) seconds
2018/03/12-10:36:30 ... thread 0: (1024,6144) ops and (306495.1,126984.1) ops/second in (0.003341,0.048384) seconds
2018/03/12-10:36:30 ... thread 0: (1024,7168) ops and (339185.2,139447.1) ops/second in (0.003019,0.051403) seconds
2018/03/12-10:36:30 ... thread 0: (1024,8192) ops and (384240.2,151512.9) ops/second in (0.002665,0.054068) seconds
2018/03/12-10:36:30 ... thread 0: (1024,9216) ops and (385252.1,162465.2) ops/second in (0.002658,0.056726) seconds
2018/03/12-10:36:30 ... thread 0: (1024,10240) ops and (371553.0,172152.9) ops/second in (0.002756,0.059482) seconds
...
2018/03/12-10:36:37 ... thread 0: (1024,993280) ops and (70127.4,142518.0) ops/second in (0.014602,6.969505) seconds
2018/03/12-10:36:37 ... thread 0: (1024,994304) ops and (199415.8,142559.9) ops/second in (0.005135,6.974640) seconds
2018/03/12-10:36:37 ... thread 0: (1024,995328) ops and (75953.1,142431.4) ops/second in (0.013482,6.988122) seconds
2018/03/12-10:36:37 ... thread 0: (1024,996352) ops and (200823.7,142474.0) ops/second in (0.005099,6.993221) seconds
2018/03/12-10:36:37 ... thread 0: (1024,997376) ops and (71975.8,142330.8) ops/second in (0.014227,7.007448) seconds
2018/03/12-10:36:37 ... thread 0: (1024,998400) ops and (62117.1,142142.6) ops/second in (0.016485,7.023933) seconds
2018/03/12-10:36:37 ... thread 0: (1024,999424) ops and (36366.2,141720.2) ops/second in (0.028158,7.052091) seconds
2018/03/12-10:36:37 ... thread 0: (1024,1000448) ops and (61914.3,141533.5) ops/second in (0.016539,7.068630) seconds
2018/03/12-10:36:37 ... thread 0: (1024,1001472) ops and (60985.1,141342.6) ops/second in (0.016791,7.085421) seconds
2018/03/12-10:36:37 ... thread 0: (1024,1002496) ops and (60466.5,141149.8) ops/second in (0.016935,7.102356) seconds
2018/03/12-10:36:37 ... thread 0: (1024,1003520) ops and (60189.3,140956.3) ops/second in (0.017013,7.119369) seconds
2018/03/12-10:36:37 ... thread 0: (1024,1004544) ops and (61731.4,140772.1) ops/second in (0.016588,7.135957) seconds
...
2018/03/12-10:40:15 ... thread 0: (1024,3236864) ops and (5620.5,14373.0) ops/second in (0.182189,225.203790) seconds
2018/03/12-10:40:15 ... thread 0: (1024,3237888) ops and (6098.5,14366.9) ops/second in (0.167911,225.371701) seconds
2018/03/12-10:40:15 ... thread 0: (1024,3238912) ops and (5469.5,14359.5) ops/second in (0.187221,225.558922) seconds
2018/03/12-10:40:15 ... thread 0: (1024,3239936) ops and (5593.9,14352.4) ops/second in (0.183056,225.741978) seconds
2018/03/12-10:40:16 ... thread 0: (1024,3240960) ops and (5806.9,14345.7) ops/second in (0.176342,225.918320) seconds
2018/03/12-10:40:16 ... thread 0: (1024,3241984) ops and (5332.9,14338.1) ops/second in (0.192016,226.110336) seconds
2018/03/12-10:40:16 ... thread 0: (1024,3243008) ops and (5532.3,14330.9) ops/second in (0.185096,226.295432) seconds
2018/03/12-10:40:16 ... thread 0: (1024,3244032) ops and (6108.8,14324.8) ops/second in (0.167626,226.463058) seconds
2018/03/12-10:40:16 ... thread 0: (1024,3245056) ops and (6074.7,14318.6) ops/second in (0.168567,226.631625) seconds
2018/03/12-10:40:17 ... thread 0: (1024,3246080) ops and (5615.2,14311.6) ops/second in (0.182362,226.813987) seconds
2018/03/12-10:40:17 ... thread 0: (1024,3247104) ops and (5529.3,14304.5) ops/second in (0.185194,226.999181) seconds
2018/03/12-10:40:17 ... thread 0: (1024,3248128) ops and (5846.2,14298.0) ops/second in (0.175156,227.174337) seconds
2018/03/12-10:40:17 ... thread 0: (1024,3249152) ops and (5741.5,14291.2) ops/second in (0.178351,227.352688) seconds
2018/03/12-10:40:17 ... thread 0: (1024,3250176) ops and (5640.2,14284.3) ops/second in (0.181555,227.534243) seconds
...
2018/03/12-11:30:39 ... thread 0: (1024,9988096) ops and (1917.2,3074.3) ops/second in (0.534112,3248.860552) seconds
2018/03/12-11:30:39 ... thread 0: (1024,9989120) ops and (1858.9,3074.1) ops/second in (0.550851,3249.411403) seconds
2018/03/12-11:30:40 ... thread 0: (1024,9990144) ops and (1922.8,3073.9) ops/second in (0.532557,3249.943960) seconds
2018/03/12-11:30:40 ... thread 0: (1024,9991168) ops and (1857.2,3073.7) ops/second in (0.551382,3250.495342) seconds
2018/03/12-11:30:41 ... thread 0: (1024,9992192) ops and (1851.3,3073.5) ops/second in (0.553130,3251.048472) seconds
2018/03/12-11:30:41 ... thread 0: (1024,9993216) ops and (1941.0,3073.3) ops/second in (0.527568,3251.576040) seconds
2018/03/12-11:30:42 ... thread 0: (1024,9994240) ops and (1923.1,3073.2) ops/second in (0.532461,3252.108501) seconds
2018/03/12-11:30:42 ... thread 0: (1024,9995264) ops and (1987.6,3073.0) ops/second in (0.515200,3252.623701) seconds
2018/03/12-11:30:43 ... thread 0: (1024,9996288) ops and (1931.2,3072.8) ops/second in (0.530233,3253.153934) seconds
2018/03/12-11:30:43 ... thread 0: (1024,9997312) ops and (1918.9,3072.6) ops/second in (0.533633,3253.687567) seconds
2018/03/12-11:30:44 ... thread 0: (1024,9998336) ops and (1999.0,3072.4) ops/second in (0.512246,3254.199813) seconds
2018/03/12-11:30:44 ... thread 0: (1024,9999360) ops and (1853.3,3072.2) ops/second in (0.552533,3254.752346) seconds
fillrandbatch : 325.508 micros/op 3072 ops/sec; 3.0 MB/s
这是从 iostat 以 1 秒的间隔转储的 read/write 速率:
Device tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sdb 73.00 0.12 25.52 0 25
sdb 531.00 0.00 495.21 0 495
sdb 15089.00 0.00 488.77 0 488
sdb 27431.00 0.01 463.55 0 463
sdb 13093.00 0.00 478.77 0 478
sdb 53676.00 0.00 413.79 0 413
sdb 16781.00 0.00 483.60 0 483
sdb 22267.00 0.00 323.32 0 323
sdb 23945.00 0.00 164.55 0 164
sdb 22867.00 0.00 152.25 0 152
sdb 22038.00 0.00 146.39 0 146
sdb 23825.00 0.00 263.61 0 263
...
sdb 20866.00 85.81 76.90 85 76
sdb 7684.00 101.75 115.19 101 115
sdb 3707.00 154.48 0.00 154 0
sdb 4349.00 181.41 0.00 181 0
sdb 4373.00 184.70 0.00 184 0
sdb 4329.00 185.04 0.00 185 0
sdb 4338.00 182.30 0.01 182 0
sdb 4364.00 184.27 0.00 184 0
sdb 5310.00 177.32 4.99 177 4
sdb 32130.00 99.07 119.70 99 119
sdb 27010.00 103.26 99.25 103 99
sdb 11109.00 67.18 99.96 67 99
sdb 3931.00 172.51 0.00 172 0
sdb 4112.00 171.28 0.00 171 0
sdb 4202.00 183.03 0.00 183 0
sdb 4119.00 183.79 0.00 183 0
sdb 4232.00 182.77 0.02 182 0
sdb 4224.00 185.90 0.00 185 0
sdb 4304.00 186.17 0.00 186 0
sdb 4279.00 188.83 0.00 188 0
sdb 4087.00 184.38 0.00 184 0
sdb 7758.00 163.86 16.70 163 16
sdb 21309.00 68.95 80.11 68 80
sdb 21166.00 81.66 78.42 81 78
sdb 19328.00 71.56 71.55 71 71
sdb 20836.00 89.08 76.52 89 76
sdb 3211.00 112.01 82.21 112 82
sdb 3939.00 173.40 0.00 173 0
sdb 3992.00 178.03 0.00 178 0
sdb 4251.00 181.49 0.00 181 0
sdb 4148.00 185.63 0.00 185 0
sdb 4094.00 184.12 0.01 184 0
sdb 4241.00 187.38 0.00 187 0
sdb 4044.00 186.60 0.00 186 0
sdb 4049.00 185.47 0.00 185 0
sdb 4247.00 189.17 0.00 189 0
...
sdb 17457.00 105.45 64.05 105 64
sdb 16736.00 82.12 62.35 82 62
sdb 12074.00 108.76 66.21 108 66
sdb 2232.00 194.44 0.00 194 0
sdb 2171.00 187.27 0.02 187 0
sdb 2322.00 197.91 0.00 197 0
sdb 2311.00 194.65 0.00 194 0
sdb 2240.00 187.93 0.00 187 0
sdb 2189.00 191.38 0.00 191 0
sdb 2266.00 192.33 0.01 192 0
sdb 2312.00 198.95 0.00 198 0
sdb 2310.00 199.84 0.00 199 0
sdb 2350.00 198.83 0.00 198 0
sdb 2275.00 198.31 0.00 198 0
sdb 3952.00 185.05 6.79 185 6
sdb 15842.00 59.89 59.67 59 59
sdb 16676.00 88.24 61.79 88 61
sdb 14768.00 75.94 55.00 75 54
sdb 5677.00 141.71 35.03 141 35
sdb 2135.00 184.78 0.04 184 0
sdb 2301.00 197.18 0.00 197 0
sdb 2334.00 198.81 0.00 198 0
sdb 2304.00 198.83 0.00 198 0
sdb 2348.00 198.67 0.00 198 0
sdb 2352.00 198.42 0.01 198 0
sdb 2373.00 199.32 0.00 199 0
sdb 2363.00 197.55 0.00 197 0
sdb 2289.00 198.71 0.00 198 0
sdb 2246.00 189.31 0.00 189 0
sdb 2357.00 198.64 0.01 198 0
sdb 2338.00 197.96 0.00 197 0
sdb 6292.00 177.60 16.56 177 16
sdb 19374.00 93.72 72.16 93 72
sdb 16873.00 101.38 62.01 101 62
sdb 16960.00 98.99 76.84 98 76
sdb 2299.00 189.32 6.16 189 6
sdb 2285.00 195.82 0.00 195 0
sdb 2346.00 198.25 0.00 198 0
sdb 2325.00 198.91 0.00 198 0
sdb 2353.00 197.72 0.02 197 0
sdb 2320.00 198.82 0.00 198 0
sdb 2327.00 200.05 0.00 200 0
sdb 2340.00 198.35 0.00 198 0
sdb 2322.00 199.29 0.00 199 0
sdb 2316.00 197.43 0.01 197 0
sdb 690.00 51.17 0.00 51 0
在@Howard Chu 的帮助下,我找到了问题所在。
读取过多是由 OS read-ahead 引起的。默认情况下,当我们读取一个页面时,OS 会将该页面之后的页面预取到内存中。然而,这在数据库中通常是不可取的,因为我们通常进行随机读取。所以我们需要禁用 read-ahead 以获得随机读取的最佳性能。
对于基准测试,有一个命令行选项 --readahead=0,它反过来为 LMDB 设置 MDB_NORDAHEAD 选项。
禁用read-ahead后,即时写入速率在测试结束时从 1MB/s 增加到 8MB/s。并且在 iostat 中观察到的 read/write 数量几乎相同。