在 Java 中,如何为文件的特定部分创建 InputStream?
In Java, how can I create an InputStream for a specific part of a file?
我需要一个 InputStream
来读取文件的特定部分,仅此而已。
从 InputStream 的消费者的角度来看,内容似乎只是那个特定的部分。 Consumer<InputStream>
不会意识到它的数据来自一个更大的文件。
因此 InputStream 的行为应如下所示:
- 静默跳过文件开头。
- 然后文件的所需部分是 returned。
- 即使文件包含更多数据,对
is.read()
的后续调用也会 return -1
。
Path file= Paths.get("file.dat");
int start = 12000;
int size = 600;
try(InputStream input = getPartialInputStream(file, start, size)){
// This should receive an inputstream that returns exactly 600 bytes.
// Those bytes should correspond to the bytes in "file.dat" found from position 12000 upto 12600.
thirdPartyMethod(input);
}
是否有无需自己实施自定义 InputStream
即可执行此操作的好方法?
这样的 getPartialInputStream
方法会是什么样子?
根据原始流的来源,您可能希望丢弃它并 return 您自己的流。如果原始流支持reset()
,接收端的用户可能会将开始数据对自己可见。
public InputStream getPartialInputStream(InputStream is, int start, int size) throws IOException {
// Put your fast-forward logic here, might want to use is.skip() instead
for (int i = 0; i < start; i++) {
is.read();
}
// Rewrite the part of stream you want the caller to receive so that
// they receive *only* this part
ByteArrayOutputStream baos = new ByteArrayOutputStream();
for (int i = 0; i < size; i++) {
int read = is.read();
if (read != -1) {
baos.write(read);
} else {
break;
}
}
is.close();
return new ByteArrayInputStream(baos.toByteArray());
}
编辑作为对评论的回答:
如果不希望重写流,例如由于内存限制,您可以像在第一个循环中一样读取 start
字节,然后 return 使用类似 Guava 的 ByteStreams.limit(is, size)
的流。或者将流子类化并用计数器覆盖 read()
以在读取大小后立即保持 returning -1
。
您还可以编写一个临时文件,return 它是流 - 这将阻止最终用户通过原始文件的 FileInputStream 的反射找到文件名。
有一种叫做 MappedByteBuffer
的东西,它的内容是文件的内存映射区域。
Another question has an answer 展示了如何将 MappedByteBuffer
映射到 InputStream
。这引导我找到这个解决方案:
public InputStream getPartialInputStream(file, start, size) {
try (FileChannel channel = FileChannel.open(inFile, READ)) {
MappedByteBuffer content = channel.map(READ_ONLY, start, size);
return new ByteBufferBackedInputStream(content);
}
}
public class ByteBufferBackedInputStream extends InputStream {
ByteBuffer buf;
public ByteBufferBackedInputStream(ByteBuffer buf) {
this.buf = buf;
}
public int read() throws IOException {
if (!buf.hasRemaining()) {
return -1;
}
return buf.get() & 0xFF;
}
public int read(byte[] bytes, int off, int len)
throws IOException {
if (!buf.hasRemaining()) {
return -1;
}
len = Math.min(len, buf.remaining());
buf.get(bytes, off, len);
return len;
}
}
有关锁定系统资源的警告(Windows)
MappedByteBuffer
遇到 bug 问题,其中底层文件被映射缓冲区锁定,直到缓冲区本身被垃圾回收,并且没有干净的解决方法。
因此,您可以仅在以后不必delete/move/rename 文件 时才使用此解决方案。尝试将导致 java.nio.file.AccessDeniedException
(除非您足够幸运,缓冲区已经被垃圾回收)。
我不确定我是否应该对此抱有希望 getting fixed 很快。
我写了一个实用工具class,你可以这样使用:
try(FileChannel channel = FileChannel.open(file, READ);
InputStream input = new PartialChannelInputStream(channel, start, start + size)) {
thirdPartyMethod(input);
}
它使用 ByteBuffer 读取文件内容,因此您可以控制内存占用。
import java.io.IOException;
import java.io.InputStream;
import java.nio.BufferUnderflowException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
public class PartialChannelInputStream extends InputStream {
private static final int DEFAULT_BUFFER_CAPACITY = 2048;
private final FileChannel channel;
private final ByteBuffer buffer;
private long position;
private final long end;
public PartialChannelInputStream(FileChannel channel, long start, long end)
throws IOException {
this(channel, start, end, DEFAULT_BUFFER_CAPACITY);
}
public PartialChannelInputStream(FileChannel channel, long start, long end, int bufferCapacity)
throws IOException {
if (start > end) {
throw new IllegalArgumentException("start(" + start + ") > end(" + end + ")");
}
this.channel = channel;
this.position = start;
this.end = end;
this.buffer = ByteBuffer.allocateDirect(bufferCapacity);
fillBuffer(end - start);
}
private void fillBuffer(long stillToRead) throws IOException {
if (stillToRead < buffer.limit()) {
buffer.limit((int) stillToRead);
}
channel.read(buffer, position);
buffer.flip();
}
@Override
public int read() throws IOException {
long stillToRead = end - position;
if (stillToRead <= 0) {
return -1;
}
if (!buffer.hasRemaining()) {
buffer.flip();
fillBuffer(stillToRead);
}
try {
position++;
return buffer.get();
} catch (BufferUnderflowException e) {
// Encountered EOF
position = end;
return -1;
}
}
}
上面的这个实现允许创建多个 PartialChannelInputStream
从同一个 FileChannel
读取并同时使用它们。
如果没有必要,下面的简化代码直接采用 Path
。
import static java.nio.file.StandardOpenOption.READ;
import java.io.IOException;
import java.io.InputStream;
import java.nio.BufferUnderflowException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.file.Path;
public class PartialFileInputStream extends InputStream {
private static final int DEFAULT_BUFFER_CAPACITY = 2048;
private final FileChannel channel;
private final ByteBuffer buffer;
private long stillToRead;
public PartialChannelInputStream(Path file, long start, long end)
throws IOException {
this(channel, start, end, DEFAULT_BUFFER_CAPACITY);
}
public PartialChannelInputStream(Path file, long start, long end, int bufferCapacity)
throws IOException {
if (start > end) {
throw new IllegalArgumentException("start(" + start + ") > end(" + end + ")");
}
this.channel = FileChannel.open(file, READ).position(start);
this.buffer = ByteBuffer.allocateDirect(bufferCapacity);
this.stillToRead = end - start;
fillBuffer();
}
private void fillBuffer() throws IOException {
if (stillToRead < buffer.limit()) {
buffer.limit((int) stillToRead);
}
channel.read(buffer);
buffer.flip();
}
@Override
public int read() throws IOException {
if (stillToRead <= 0) {
return -1;
}
if (!buffer.hasRemaining()) {
buffer.flip();
fillBuffer();
}
try {
stillToRead--;
return buffer.get();
} catch (BufferUnderflowException e) {
// Encountered EOF
stillToRead = 0;
return -1;
}
}
@Override
public void close() throws IOException {
channel.close();
}
}
我需要一个 InputStream
来读取文件的特定部分,仅此而已。
从 InputStream 的消费者的角度来看,内容似乎只是那个特定的部分。 Consumer<InputStream>
不会意识到它的数据来自一个更大的文件。
因此 InputStream 的行为应如下所示:
- 静默跳过文件开头。
- 然后文件的所需部分是 returned。
- 即使文件包含更多数据,对
is.read()
的后续调用也会 return-1
。
Path file= Paths.get("file.dat");
int start = 12000;
int size = 600;
try(InputStream input = getPartialInputStream(file, start, size)){
// This should receive an inputstream that returns exactly 600 bytes.
// Those bytes should correspond to the bytes in "file.dat" found from position 12000 upto 12600.
thirdPartyMethod(input);
}
是否有无需自己实施自定义 InputStream
即可执行此操作的好方法?
这样的 getPartialInputStream
方法会是什么样子?
根据原始流的来源,您可能希望丢弃它并 return 您自己的流。如果原始流支持reset()
,接收端的用户可能会将开始数据对自己可见。
public InputStream getPartialInputStream(InputStream is, int start, int size) throws IOException {
// Put your fast-forward logic here, might want to use is.skip() instead
for (int i = 0; i < start; i++) {
is.read();
}
// Rewrite the part of stream you want the caller to receive so that
// they receive *only* this part
ByteArrayOutputStream baos = new ByteArrayOutputStream();
for (int i = 0; i < size; i++) {
int read = is.read();
if (read != -1) {
baos.write(read);
} else {
break;
}
}
is.close();
return new ByteArrayInputStream(baos.toByteArray());
}
编辑作为对评论的回答:
如果不希望重写流,例如由于内存限制,您可以像在第一个循环中一样读取 start
字节,然后 return 使用类似 Guava 的 ByteStreams.limit(is, size)
的流。或者将流子类化并用计数器覆盖 read()
以在读取大小后立即保持 returning -1
。
您还可以编写一个临时文件,return 它是流 - 这将阻止最终用户通过原始文件的 FileInputStream 的反射找到文件名。
有一种叫做 MappedByteBuffer
的东西,它的内容是文件的内存映射区域。
Another question has an answer 展示了如何将 MappedByteBuffer
映射到 InputStream
。这引导我找到这个解决方案:
public InputStream getPartialInputStream(file, start, size) {
try (FileChannel channel = FileChannel.open(inFile, READ)) {
MappedByteBuffer content = channel.map(READ_ONLY, start, size);
return new ByteBufferBackedInputStream(content);
}
}
public class ByteBufferBackedInputStream extends InputStream {
ByteBuffer buf;
public ByteBufferBackedInputStream(ByteBuffer buf) {
this.buf = buf;
}
public int read() throws IOException {
if (!buf.hasRemaining()) {
return -1;
}
return buf.get() & 0xFF;
}
public int read(byte[] bytes, int off, int len)
throws IOException {
if (!buf.hasRemaining()) {
return -1;
}
len = Math.min(len, buf.remaining());
buf.get(bytes, off, len);
return len;
}
}
有关锁定系统资源的警告(Windows)
MappedByteBuffer
遇到 bug 问题,其中底层文件被映射缓冲区锁定,直到缓冲区本身被垃圾回收,并且没有干净的解决方法。
因此,您可以仅在以后不必delete/move/rename 文件 时才使用此解决方案。尝试将导致 java.nio.file.AccessDeniedException
(除非您足够幸运,缓冲区已经被垃圾回收)。
我不确定我是否应该对此抱有希望 getting fixed 很快。
我写了一个实用工具class,你可以这样使用:
try(FileChannel channel = FileChannel.open(file, READ);
InputStream input = new PartialChannelInputStream(channel, start, start + size)) {
thirdPartyMethod(input);
}
它使用 ByteBuffer 读取文件内容,因此您可以控制内存占用。
import java.io.IOException;
import java.io.InputStream;
import java.nio.BufferUnderflowException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
public class PartialChannelInputStream extends InputStream {
private static final int DEFAULT_BUFFER_CAPACITY = 2048;
private final FileChannel channel;
private final ByteBuffer buffer;
private long position;
private final long end;
public PartialChannelInputStream(FileChannel channel, long start, long end)
throws IOException {
this(channel, start, end, DEFAULT_BUFFER_CAPACITY);
}
public PartialChannelInputStream(FileChannel channel, long start, long end, int bufferCapacity)
throws IOException {
if (start > end) {
throw new IllegalArgumentException("start(" + start + ") > end(" + end + ")");
}
this.channel = channel;
this.position = start;
this.end = end;
this.buffer = ByteBuffer.allocateDirect(bufferCapacity);
fillBuffer(end - start);
}
private void fillBuffer(long stillToRead) throws IOException {
if (stillToRead < buffer.limit()) {
buffer.limit((int) stillToRead);
}
channel.read(buffer, position);
buffer.flip();
}
@Override
public int read() throws IOException {
long stillToRead = end - position;
if (stillToRead <= 0) {
return -1;
}
if (!buffer.hasRemaining()) {
buffer.flip();
fillBuffer(stillToRead);
}
try {
position++;
return buffer.get();
} catch (BufferUnderflowException e) {
// Encountered EOF
position = end;
return -1;
}
}
}
上面的这个实现允许创建多个 PartialChannelInputStream
从同一个 FileChannel
读取并同时使用它们。
如果没有必要,下面的简化代码直接采用 Path
。
import static java.nio.file.StandardOpenOption.READ;
import java.io.IOException;
import java.io.InputStream;
import java.nio.BufferUnderflowException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.file.Path;
public class PartialFileInputStream extends InputStream {
private static final int DEFAULT_BUFFER_CAPACITY = 2048;
private final FileChannel channel;
private final ByteBuffer buffer;
private long stillToRead;
public PartialChannelInputStream(Path file, long start, long end)
throws IOException {
this(channel, start, end, DEFAULT_BUFFER_CAPACITY);
}
public PartialChannelInputStream(Path file, long start, long end, int bufferCapacity)
throws IOException {
if (start > end) {
throw new IllegalArgumentException("start(" + start + ") > end(" + end + ")");
}
this.channel = FileChannel.open(file, READ).position(start);
this.buffer = ByteBuffer.allocateDirect(bufferCapacity);
this.stillToRead = end - start;
fillBuffer();
}
private void fillBuffer() throws IOException {
if (stillToRead < buffer.limit()) {
buffer.limit((int) stillToRead);
}
channel.read(buffer);
buffer.flip();
}
@Override
public int read() throws IOException {
if (stillToRead <= 0) {
return -1;
}
if (!buffer.hasRemaining()) {
buffer.flip();
fillBuffer();
}
try {
stillToRead--;
return buffer.get();
} catch (BufferUnderflowException e) {
// Encountered EOF
stillToRead = 0;
return -1;
}
}
@Override
public void close() throws IOException {
channel.close();
}
}