为什么 Files.readAllBytes 首先读取 bufsize 为 1?
Why does Files.readAllBytes first read with a bufsize of 1?
我正在编写一个简单的 Linux USB 字符驱动程序,允许从它创建的设备节点读取一个短字符串。
它工作正常,但我注意到使用 cat
从设备节点读取和使用 Files.readAllBytes 从 Java 程序读取之间存在差异。
用cat
读取,在第一次调用file_operations.read
函数时传入大小为131072的缓冲区,并复制5字节字符串:
kernel: [46863.186331] usbtherm: Device was opened
kernel: [46863.186407] usbtherm: buffer: 131072, read: 5, offset: 5
kernel: [46863.186444] usbtherm: done, returning 0
kernel: [46863.186481] usbtherm: Device was released
用Files.readAllBytes
读取,第一次调用传入一个大小为1的缓冲区,然后传入一个大小为8191的缓冲区,剩下的4个字节被复制:
kernel: [51442.728879] usbtherm: Device was opened
kernel: [51442.729032] usbtherm: buffer: 1, read: 1, offset: 1
kernel: [51442.729102] usbtherm: buffer: 8191, read: 4, offset: 5
kernel: [51442.729140] usbtherm: done, returning 0
kernel: [51442.729158] usbtherm: Device was released
file_operations.read
函数(包括调试printk
的)是:
static ssize_t device_read(struct file *filp, char *buffer, size_t length,
loff_t *offset)
{
int err = 0;
size_t msg_len = 0;
size_t len_read = 0;
msg_len = strlen(message);
if (*offset >= msg_len)
{
printk(KERN_INFO "usbtherm: done, returning 0\n");
return 0;
}
len_read = msg_len - *offset;
if (len_read > length)
{
len_read = length;
}
err = copy_to_user(buffer, message + *offset, len_read);
if (err)
{
err = -EFAULT;
goto error;
}
*offset += len_read;
printk(KERN_INFO "usbtherm: buffer: %ld, read: %ld, offset: %lld\n",
length, len_read, *offset);
return len_read;
error:
return err;
}
两种情况下读取的字符串是相同的,所以我想没问题,我只是想知道为什么会有不同的行为?
GNU cat
在 cat
、
的来源中
insize = io_blksize (stat_buf);
你可以看到缓冲区的大小是由coreutils的io_bliksize()
决定的,在这方面interesting comment,
/* As of May 2014, 128KiB is determined to be the minimium blksize
to best minimize system call overhead.
所以这将用 cat
来解释结果,因为 128KiB 是 131072 字节并且 GNUrus 认为这是最小化系统调用开销的最佳方法。
Files.readAllBytes
有点难掌握,至少对于像我这样单纯的人来说是这样。 source of readAllBytes
public static byte[] readAllBytes(Path path) throws IOException {
try (SeekableByteChannel sbc = Files.newByteChannel(path);
InputStream in = Channels.newInputStream(sbc)) {
long size = sbc.size();
if (size > (long)MAX_BUFFER_SIZE)
throw new OutOfMemoryError("Required array size too large");
return read(in, (int)size);
}
}
显示它只是调用 read(InputStream, initialSize)
,其中初始大小由字节通道的大小决定。 size()
方法还有一个有趣的注释,
The size of files that are not isRegularFile() files is implementation
specific and therefore unspecified.
最后,read(InputStream, initialSize)
调用 InputStream.read(byteArray, offset, length)
进行阅读(源中的注释来自原始来源,并且自 capacity - nread = 0
以来令人困惑,所以第一次 while 循环是达到,它不读到EOF):
private static byte[] read(InputStream source, int initialSize)
throws IOException {
int capacity = initialSize;
byte[] buf = new byte[capacity];
int nread = 0;
int n;
for (;;) {
// read to EOF which may read more or less than initialSize (eg: file
// is truncated while we are reading)
while ((n = source.read(buf, nread, capacity - nread)) > 0)
nread += n;
// if last call to source.read() returned -1, we are done
// otherwise, try to read one more byte; if that failed we're done too
if (n < 0 || (n = source.read()) < 0)
break;
// one more byte was read; need to allocate a larger buffer
if (capacity <= MAX_BUFFER_SIZE - capacity) {
capacity = Math.max(capacity << 1, BUFFER_SIZE);
} else {
if (capacity == MAX_BUFFER_SIZE)
throw new OutOfMemoryError("Required array size too large");
capacity = MAX_BUFFER_SIZE;
}
buf = Arrays.copyOf(buf, capacity);
buf[nread++] = (byte)n;
}
return (capacity == nread) ? buf : Arrays.copyOf(buf, nread);
}
BUFFER_SIZE
的声明
// buffer size used for reading and writing
private static final int BUFFER_SIZE = 8192;
Documentation/source of InputStream.read(byteArray, offset, length)
包含相关评论,
If length is zero, then no bytes are read and 0 is returned;
由于 size()
returns 0 字节用于您的设备节点,以下是 read(InputStream source, int initialSize)
中发生的情况:
在 for (;;)
循环的第一轮中:
capacity=0
和 nread=0
。所以while ((n =
source.read(buf, nread, capacity - nread)) > 0)
中的source.read
将0个字节读入buf
和returns 0:while
循环的条件为假,它所做的只是n = 0
作为条件的副作用。
由于 n = 0
,if (n < 0 || (n = source.read()) < 0) break;
中的 source.read()
读取 1 个字节,表达式的计算结果为 false
:我们的 for
循环不'退出。这会导致您的“缓冲区:1,读取:1,偏移量:1”
缓冲区的capacity
设置为BUFFER_SIZE
,读取的单个字节放入buf[0]
,nread
递增。
第二轮for (;;)
循环
因此有capacity=8192
和nread=1
,这使得while ((n = source.read(buf, nread, capacity - nread)) > 0) nread += n;
从偏移量1读取8191字节直到source.read
returns -1 :结束!这发生在读取剩余的 4 个字节之后。这会导致您的“缓冲区:8191,读取:4,偏移量:5”。
从现在 n = -1
开始,if (n < 0 || (n = source.read()) < 0) break;
中的表达式在 n < 0
上短路,这使得我们的 for
循环退出而不读取任何内容更多字节。
最后,方法 returns Arrays.copyOf(buf, nread)
:缓冲区中放置读取字节的那部分的副本。
我正在编写一个简单的 Linux USB 字符驱动程序,允许从它创建的设备节点读取一个短字符串。
它工作正常,但我注意到使用 cat
从设备节点读取和使用 Files.readAllBytes 从 Java 程序读取之间存在差异。
用cat
读取,在第一次调用file_operations.read
函数时传入大小为131072的缓冲区,并复制5字节字符串:
kernel: [46863.186331] usbtherm: Device was opened
kernel: [46863.186407] usbtherm: buffer: 131072, read: 5, offset: 5
kernel: [46863.186444] usbtherm: done, returning 0
kernel: [46863.186481] usbtherm: Device was released
用Files.readAllBytes
读取,第一次调用传入一个大小为1的缓冲区,然后传入一个大小为8191的缓冲区,剩下的4个字节被复制:
kernel: [51442.728879] usbtherm: Device was opened
kernel: [51442.729032] usbtherm: buffer: 1, read: 1, offset: 1
kernel: [51442.729102] usbtherm: buffer: 8191, read: 4, offset: 5
kernel: [51442.729140] usbtherm: done, returning 0
kernel: [51442.729158] usbtherm: Device was released
file_operations.read
函数(包括调试printk
的)是:
static ssize_t device_read(struct file *filp, char *buffer, size_t length,
loff_t *offset)
{
int err = 0;
size_t msg_len = 0;
size_t len_read = 0;
msg_len = strlen(message);
if (*offset >= msg_len)
{
printk(KERN_INFO "usbtherm: done, returning 0\n");
return 0;
}
len_read = msg_len - *offset;
if (len_read > length)
{
len_read = length;
}
err = copy_to_user(buffer, message + *offset, len_read);
if (err)
{
err = -EFAULT;
goto error;
}
*offset += len_read;
printk(KERN_INFO "usbtherm: buffer: %ld, read: %ld, offset: %lld\n",
length, len_read, *offset);
return len_read;
error:
return err;
}
两种情况下读取的字符串是相同的,所以我想没问题,我只是想知道为什么会有不同的行为?
GNU cat
在 cat
、
insize = io_blksize (stat_buf);
你可以看到缓冲区的大小是由coreutils的io_bliksize()
决定的,在这方面interesting comment,
/* As of May 2014, 128KiB is determined to be the minimium blksize to best minimize system call overhead.
所以这将用 cat
来解释结果,因为 128KiB 是 131072 字节并且 GNUrus 认为这是最小化系统调用开销的最佳方法。
Files.readAllBytes
有点难掌握,至少对于像我这样单纯的人来说是这样。 source of readAllBytes
public static byte[] readAllBytes(Path path) throws IOException {
try (SeekableByteChannel sbc = Files.newByteChannel(path);
InputStream in = Channels.newInputStream(sbc)) {
long size = sbc.size();
if (size > (long)MAX_BUFFER_SIZE)
throw new OutOfMemoryError("Required array size too large");
return read(in, (int)size);
}
}
显示它只是调用 read(InputStream, initialSize)
,其中初始大小由字节通道的大小决定。 size()
方法还有一个有趣的注释,
The size of files that are not isRegularFile() files is implementation specific and therefore unspecified.
最后,read(InputStream, initialSize)
调用 InputStream.read(byteArray, offset, length)
进行阅读(源中的注释来自原始来源,并且自 capacity - nread = 0
以来令人困惑,所以第一次 while 循环是达到,它不读到EOF):
private static byte[] read(InputStream source, int initialSize)
throws IOException {
int capacity = initialSize;
byte[] buf = new byte[capacity];
int nread = 0;
int n;
for (;;) {
// read to EOF which may read more or less than initialSize (eg: file
// is truncated while we are reading)
while ((n = source.read(buf, nread, capacity - nread)) > 0)
nread += n;
// if last call to source.read() returned -1, we are done
// otherwise, try to read one more byte; if that failed we're done too
if (n < 0 || (n = source.read()) < 0)
break;
// one more byte was read; need to allocate a larger buffer
if (capacity <= MAX_BUFFER_SIZE - capacity) {
capacity = Math.max(capacity << 1, BUFFER_SIZE);
} else {
if (capacity == MAX_BUFFER_SIZE)
throw new OutOfMemoryError("Required array size too large");
capacity = MAX_BUFFER_SIZE;
}
buf = Arrays.copyOf(buf, capacity);
buf[nread++] = (byte)n;
}
return (capacity == nread) ? buf : Arrays.copyOf(buf, nread);
}
BUFFER_SIZE
的声明
// buffer size used for reading and writing
private static final int BUFFER_SIZE = 8192;
Documentation/source of InputStream.read(byteArray, offset, length)
包含相关评论,
If length is zero, then no bytes are read and 0 is returned;
由于 size()
returns 0 字节用于您的设备节点,以下是 read(InputStream source, int initialSize)
中发生的情况:
在 for (;;)
循环的第一轮中:
capacity=0
和nread=0
。所以while ((n = source.read(buf, nread, capacity - nread)) > 0)
中的source.read
将0个字节读入buf
和returns 0:while
循环的条件为假,它所做的只是n = 0
作为条件的副作用。由于
n = 0
,if (n < 0 || (n = source.read()) < 0) break;
中的source.read()
读取 1 个字节,表达式的计算结果为false
:我们的for
循环不'退出。这会导致您的“缓冲区:1,读取:1,偏移量:1”缓冲区的
capacity
设置为BUFFER_SIZE
,读取的单个字节放入buf[0]
,nread
递增。
第二轮for (;;)
循环
因此有
capacity=8192
和nread=1
,这使得while ((n = source.read(buf, nread, capacity - nread)) > 0) nread += n;
从偏移量1读取8191字节直到source.read
returns -1 :结束!这发生在读取剩余的 4 个字节之后。这会导致您的“缓冲区:8191,读取:4,偏移量:5”。从现在
n = -1
开始,if (n < 0 || (n = source.read()) < 0) break;
中的表达式在n < 0
上短路,这使得我们的for
循环退出而不读取任何内容更多字节。
最后,方法 returns Arrays.copyOf(buf, nread)
:缓冲区中放置读取字节的那部分的副本。