在 Java 中,如何为文件的特定部分创建 InputStream?

In Java, how can I create an InputStream for a specific part of a file?

我需要一个 InputStream 来读取文件的特定部分,仅此而已。

从 InputStream 的消费者的角度来看,内容似乎只是那个特定的部分。 Consumer<InputStream> 不会意识到它的数据来自一个更大的文件。
因此 InputStream 的行为应如下所示:

Path file= Paths.get("file.dat");
int start = 12000;
int size = 600;

try(InputStream input = getPartialInputStream(file, start, size)){
    // This should receive an inputstream that returns exactly 600 bytes.
    // Those bytes should correspond to the bytes in "file.dat" found from position 12000 upto 12600.
    thirdPartyMethod(input);
}

是否有无需自己实施自定义 InputStream 即可执行此操作的好方法?
这样的 getPartialInputStream 方法会是什么样子?

根据原始流的来源,您可能希望丢弃它并 return 您自己的流。如果原始流支持reset(),接收端的用户可能会将开始数据对自己可见。

public InputStream getPartialInputStream(InputStream is, int start, int size) throws IOException {
    // Put your fast-forward logic here, might want to use is.skip() instead
    for (int i = 0; i < start; i++) {
        is.read();
    }
    // Rewrite the part of stream you want the caller to receive so that
    // they receive *only* this part
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    for (int i = 0; i < size; i++) {
        int read = is.read();
        if (read != -1) {
            baos.write(read);
        } else {
            break;
        }
    }
    is.close();
    return new ByteArrayInputStream(baos.toByteArray());
}

编辑作为对评论的回答:

如果不希望重写流,例如由于内存限制,您可以像在第一个循环中一样读取 start 字节,然后 return 使用类似 Guava 的 ByteStreams.limit(is, size) 的流。或者将流子类化并用计数器覆盖 read() 以在读取大小后立即保持 returning -1

您还可以编写一个临时文件,return 它是流 - 这将阻止最终用户通过原始文件的 FileInputStream 的反射找到文件名。

有一种叫做 MappedByteBuffer 的东西,它的内容是文件的内存映射区域。

Another question has an answer 展示了如何将 MappedByteBuffer 映射到 InputStream。这引导我找到这个解决方案:

public InputStream getPartialInputStream(file, start, size) {
    try (FileChannel channel = FileChannel.open(inFile, READ)) {
        MappedByteBuffer content = channel.map(READ_ONLY, start, size);
        return new ByteBufferBackedInputStream(content);
    }
}
public class ByteBufferBackedInputStream extends InputStream {

    ByteBuffer buf;

    public ByteBufferBackedInputStream(ByteBuffer buf) {
        this.buf = buf;
    }

    public int read() throws IOException {
        if (!buf.hasRemaining()) {
            return -1;
        }
        return buf.get() & 0xFF;
    }

    public int read(byte[] bytes, int off, int len)
            throws IOException {
        if (!buf.hasRemaining()) {
            return -1;
        }

        len = Math.min(len, buf.remaining());
        buf.get(bytes, off, len);
        return len;
    }
}

有关锁定系统资源的警告(Windows)

MappedByteBuffer 遇到 bug 问题,其中底层文件被映射缓冲区锁定,直到缓冲区本身被垃圾回收,并且没有干净的解决方法。

因此,您可以仅在以后不必delete/move/rename 文件 时才使用此解决方案。尝试将导致 java.nio.file.AccessDeniedException(除非您足够幸运,缓冲区已经被垃圾回收)。

我不确定我是否应该对此抱有希望 getting fixed 很快。

我写了一个实用工具class,你可以这样使用:

try(FileChannel channel = FileChannel.open(file, READ);
    InputStream input = new PartialChannelInputStream(channel, start, start + size)) {

    thirdPartyMethod(input);
}

它使用 ByteBuffer 读取文件内容,因此您可以控制内存占用。

import java.io.IOException;
import java.io.InputStream;
import java.nio.BufferUnderflowException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;

public class PartialChannelInputStream extends InputStream {

    private static final int DEFAULT_BUFFER_CAPACITY = 2048;

    private final FileChannel channel;
    private final ByteBuffer buffer;
    private long position;
    private final long end;

    public PartialChannelInputStream(FileChannel channel, long start, long end)
            throws IOException {
        this(channel, start, end, DEFAULT_BUFFER_CAPACITY);
    }

    public PartialChannelInputStream(FileChannel channel, long start, long end, int bufferCapacity)
            throws IOException {
        if (start > end) {
            throw new IllegalArgumentException("start(" + start + ") > end(" + end + ")");
        }

        this.channel = channel;
        this.position = start;
        this.end = end;
        this.buffer = ByteBuffer.allocateDirect(bufferCapacity);
        fillBuffer(end - start);
    }

    private void fillBuffer(long stillToRead) throws IOException {
        if (stillToRead < buffer.limit()) {
            buffer.limit((int) stillToRead);
        }
        channel.read(buffer, position);
        buffer.flip();
    }

    @Override
    public int read() throws IOException {
        long stillToRead = end - position;
        if (stillToRead <= 0) {
            return -1;
        }

        if (!buffer.hasRemaining()) {
            buffer.flip();
            fillBuffer(stillToRead);
        }

        try {
            position++;
            return buffer.get();
        } catch (BufferUnderflowException e) {
            // Encountered EOF
            position = end;
            return -1;
        }
    }
}

上面的这个实现允许创建多个 PartialChannelInputStream 从同一个 FileChannel 读取并同时使用它们。
如果没有必要,下面的简化代码直接采用 Path

import static java.nio.file.StandardOpenOption.READ;

import java.io.IOException;
import java.io.InputStream;
import java.nio.BufferUnderflowException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.file.Path;

public class PartialFileInputStream extends InputStream {

    private static final int DEFAULT_BUFFER_CAPACITY = 2048;

    private final FileChannel channel;
    private final ByteBuffer buffer;
    private long stillToRead;

    public PartialChannelInputStream(Path file, long start, long end)
            throws IOException {
        this(channel, start, end, DEFAULT_BUFFER_CAPACITY);
    }

    public PartialChannelInputStream(Path file, long start, long end, int bufferCapacity)
            throws IOException {
        if (start > end) {
            throw new IllegalArgumentException("start(" + start + ") > end(" + end + ")");
        }

        this.channel = FileChannel.open(file, READ).position(start);
        this.buffer = ByteBuffer.allocateDirect(bufferCapacity);
        this.stillToRead = end - start;
        fillBuffer();
    }

    private void fillBuffer() throws IOException {
        if (stillToRead < buffer.limit()) {
            buffer.limit((int) stillToRead);
        }
        channel.read(buffer);
        buffer.flip();
    }

    @Override
    public int read() throws IOException {
        if (stillToRead <= 0) {
            return -1;
        }

        if (!buffer.hasRemaining()) {
            buffer.flip();
            fillBuffer();
        }

        try {
            stillToRead--;
            return buffer.get();
        } catch (BufferUnderflowException e) {
            // Encountered EOF
            stillToRead = 0;
            return -1;
        }
    }

    @Override
    public void close() throws IOException {
        channel.close();
    }
}