为什么 Java 磁盘 I/O 的执行速度比用 C 编写的等效 I/O 代码慢得多?
Why does Java disk I/O perform so much slower than the equivalent I/O code written in C?
我有一个 SSD 磁盘,每个规格应提供不少于 10k IOPS。我的基准测试确认它可以给我 20k IOPS。
然后我创建这样一个测试:
private static final int sector = 4*1024;
private static byte[] buf = new byte[sector];
private static int duration = 10; // seconds to run
private static long[] timings = new long[50000];
public static final void main(String[] args) throws IOException {
String filename = args[0];
long size = Long.parseLong(args[1]);
RandomAccessFile raf = new RandomAccessFile(filename, "r");
Random rnd = new Random();
long start = System.currentTimeMillis();
int ios = 0;
while (System.currentTimeMillis()-start<duration*1000) {
long t1 = System.currentTimeMillis();
long pos = (long)(rnd.nextDouble()*(size>>12));
raf.seek(pos<<12);
int count = raf.read(buf);
timings[ios] = System.currentTimeMillis() - t1;
++ios;
}
System.out.println("Measured IOPS: " + ios/duration);
int totalBytes = ios*sector;
double totalSeconds = (System.currentTimeMillis()-start)/1000.0;
double speed = totalBytes/totalSeconds/1024/1024;
System.out.println(totalBytes+" bytes transferred in "+totalSeconds+" secs ("+speed+" MiB/sec)");
raf.close();
Arrays.sort(timings);
int l = timings.length;
System.out.println("The longest IO = " + timings[l-1]);
System.out.println("Median duration = " + timings[l-(ios/2)]);
System.out.println("75% duration = " + timings[l-(ios * 3 / 4)]);
System.out.println("90% duration = " + timings[l-(ios * 9 / 10)]);
System.out.println("95% duration = " + timings[l-(ios * 19 / 20)]);
System.out.println("99% duration = " + timings[l-(ios * 99 / 100)]);
}
然后我 运行 这个例子只得到 2186 IOPS:
$ sudo java -cp ./classes NioTest /dev/disk0 240057409536
Measured IOPS: 2186
89550848 bytes transferred in 10.0 secs (8.540234375 MiB/sec)
The longest IO = 35
Median duration = 0
75% duration = 0
90% duration = 0
95% duration = 0
99% duration = 0
为什么它比 C 中的相同测试慢得多?
更新: 这里是 Python 提供 20k IOPS 的代码:
def iops(dev, blocksize=4096, t=10):
fh = open(dev, 'r')
count = 0
start = time.time()
while time.time() < start+t:
count += 1
pos = random.randint(0, mediasize(dev) - blocksize) # need at least one block left
pos &= ~(blocksize-1) # sector alignment at blocksize
fh.seek(pos)
blockdata = fh.read(blocksize)
end = time.time()
t = end - start
fh.close()
Update2: NIO代码(只是一段,不会重复所有的方法)
...
RandomAccessFile raf = new RandomAccessFile(filename, "r");
InputStream in = Channels.newInputStream(raf.getChannel());
...
int count = in.read(buf);
...
因为您正在使用 RandomAccessFile
,这是 Java 中最慢的磁盘 I/O 方法之一。
尝试使用速度更快的东西,例如 BufferedInputStream
或 BufferedOutputStream
,看看速度如何。
如果您想知道为什么这会对 SSD 产生影响(因为 SSD 应该擅长随机访问),这与访问的随机性无关;这是关于带宽。如果你有一个带 1024 位宽总线的 SSD,但你每次写入只写 64 位(就像你写 long
s 或 double
s 那样),你会速度变慢。 (当然,这些数字仅供参考。)
现在,我可以看出那不是您的代码正在做的事情(或者至少,看起来正在做的事情),但是 RandomAccessFile
很有可能在幕后以这种方式实现它。再次尝试使用缓冲流,看看会发生什么。
从这篇文章来看,旧版 java 随机访问速度要慢 2.5 到 3.5 倍。这是一份研究 pdf,所以请不要怪我点击它。
Link: http://pages.cs.wisc.edu/~guo/projects/736.pdf
Java raw I/O is slower than C/C++, since system calls in Java are
more expensive; buffering improves Java I/O performance, for it
reduces system calls, yet there is no big gain for larger buffer size;
direct buffering is better than the Java-provided buffered I/O
classes, since the user can tailor it for his own needs; increasing
the operation size helps I/O performance without overheads; and system
calls are cheap in Java native methods, while the overhead of calling
native methods is rather high. When the number of native calls is
reduced properly, a performance comparable to C/C++ can be achieved.
来自那个时代的是你的代码。现在让我们不使用 RandomAccessFile
而是 java.nio
来重写它,好吗?
我有一些 nio2 代码,我们可以与 C 进行对比。可以排除垃圾收集:)
您的问题基于错误的假设,即类似于您的 Java 代码的 C 代码的性能与 IOMeter 一样好。因为这个假设是错误的,所以 C 性能和 Java 性能之间没有差异可以解释。
如果您的问题是为什么您的 Java 代码相对于 IOMeter 执行得如此糟糕,答案是 IOMeter 不会像您的代码那样一次发出一个请求。要从您的 SSD 获得全部性能,您需要使其请求队列保持非空,并且在发出下一个之前等待每个读取完成是不可能的。
尝试使用线程池来发出您的请求。
RandomAccess 在 Java 中大多速度很快,但无法与 C 相比。
但是,如果您想更好地比较 JVM 上的 IO 性能,请阅读 Martin Thompson 关于该主题的优秀博客:http://mechanical-sympathy.blogspot.co.uk/2011/12/java-sequential-io-performance.html
我有一个 SSD 磁盘,每个规格应提供不少于 10k IOPS。我的基准测试确认它可以给我 20k IOPS。
然后我创建这样一个测试:
private static final int sector = 4*1024;
private static byte[] buf = new byte[sector];
private static int duration = 10; // seconds to run
private static long[] timings = new long[50000];
public static final void main(String[] args) throws IOException {
String filename = args[0];
long size = Long.parseLong(args[1]);
RandomAccessFile raf = new RandomAccessFile(filename, "r");
Random rnd = new Random();
long start = System.currentTimeMillis();
int ios = 0;
while (System.currentTimeMillis()-start<duration*1000) {
long t1 = System.currentTimeMillis();
long pos = (long)(rnd.nextDouble()*(size>>12));
raf.seek(pos<<12);
int count = raf.read(buf);
timings[ios] = System.currentTimeMillis() - t1;
++ios;
}
System.out.println("Measured IOPS: " + ios/duration);
int totalBytes = ios*sector;
double totalSeconds = (System.currentTimeMillis()-start)/1000.0;
double speed = totalBytes/totalSeconds/1024/1024;
System.out.println(totalBytes+" bytes transferred in "+totalSeconds+" secs ("+speed+" MiB/sec)");
raf.close();
Arrays.sort(timings);
int l = timings.length;
System.out.println("The longest IO = " + timings[l-1]);
System.out.println("Median duration = " + timings[l-(ios/2)]);
System.out.println("75% duration = " + timings[l-(ios * 3 / 4)]);
System.out.println("90% duration = " + timings[l-(ios * 9 / 10)]);
System.out.println("95% duration = " + timings[l-(ios * 19 / 20)]);
System.out.println("99% duration = " + timings[l-(ios * 99 / 100)]);
}
然后我 运行 这个例子只得到 2186 IOPS:
$ sudo java -cp ./classes NioTest /dev/disk0 240057409536
Measured IOPS: 2186
89550848 bytes transferred in 10.0 secs (8.540234375 MiB/sec)
The longest IO = 35
Median duration = 0
75% duration = 0
90% duration = 0
95% duration = 0
99% duration = 0
为什么它比 C 中的相同测试慢得多?
更新: 这里是 Python 提供 20k IOPS 的代码:
def iops(dev, blocksize=4096, t=10):
fh = open(dev, 'r')
count = 0
start = time.time()
while time.time() < start+t:
count += 1
pos = random.randint(0, mediasize(dev) - blocksize) # need at least one block left
pos &= ~(blocksize-1) # sector alignment at blocksize
fh.seek(pos)
blockdata = fh.read(blocksize)
end = time.time()
t = end - start
fh.close()
Update2: NIO代码(只是一段,不会重复所有的方法)
...
RandomAccessFile raf = new RandomAccessFile(filename, "r");
InputStream in = Channels.newInputStream(raf.getChannel());
...
int count = in.read(buf);
...
因为您正在使用 RandomAccessFile
,这是 Java 中最慢的磁盘 I/O 方法之一。
尝试使用速度更快的东西,例如 BufferedInputStream
或 BufferedOutputStream
,看看速度如何。
如果您想知道为什么这会对 SSD 产生影响(因为 SSD 应该擅长随机访问),这与访问的随机性无关;这是关于带宽。如果你有一个带 1024 位宽总线的 SSD,但你每次写入只写 64 位(就像你写 long
s 或 double
s 那样),你会速度变慢。 (当然,这些数字仅供参考。)
现在,我可以看出那不是您的代码正在做的事情(或者至少,看起来正在做的事情),但是 RandomAccessFile
很有可能在幕后以这种方式实现它。再次尝试使用缓冲流,看看会发生什么。
从这篇文章来看,旧版 java 随机访问速度要慢 2.5 到 3.5 倍。这是一份研究 pdf,所以请不要怪我点击它。
Link: http://pages.cs.wisc.edu/~guo/projects/736.pdf
Java raw I/O is slower than C/C++, since system calls in Java are more expensive; buffering improves Java I/O performance, for it reduces system calls, yet there is no big gain for larger buffer size; direct buffering is better than the Java-provided buffered I/O classes, since the user can tailor it for his own needs; increasing the operation size helps I/O performance without overheads; and system calls are cheap in Java native methods, while the overhead of calling native methods is rather high. When the number of native calls is reduced properly, a performance comparable to C/C++ can be achieved.
来自那个时代的是你的代码。现在让我们不使用 RandomAccessFile
而是 java.nio
来重写它,好吗?
我有一些 nio2 代码,我们可以与 C 进行对比。可以排除垃圾收集:)
您的问题基于错误的假设,即类似于您的 Java 代码的 C 代码的性能与 IOMeter 一样好。因为这个假设是错误的,所以 C 性能和 Java 性能之间没有差异可以解释。
如果您的问题是为什么您的 Java 代码相对于 IOMeter 执行得如此糟糕,答案是 IOMeter 不会像您的代码那样一次发出一个请求。要从您的 SSD 获得全部性能,您需要使其请求队列保持非空,并且在发出下一个之前等待每个读取完成是不可能的。
尝试使用线程池来发出您的请求。
RandomAccess 在 Java 中大多速度很快,但无法与 C 相比。 但是,如果您想更好地比较 JVM 上的 IO 性能,请阅读 Martin Thompson 关于该主题的优秀博客:http://mechanical-sympathy.blogspot.co.uk/2011/12/java-sequential-io-performance.html