Java NIO 通过 ByteBuffer 扫描某些字节和带有部分的字

Question

好的，所以我正在尝试做一些看起来应该相当简单的事情，但是有了这些新的 NIO 接口，事情让我很困惑！这就是我想要做的，我需要以字节为单位扫描文件，直到遇到某些字节！当我遇到那些特定的字节时，需要获取那段数据并对其进行处理，然后继续执行此操作。我原以为有了 ByteBuffer 中的所有这些标记、位置和限制，我就能做到这一点，但我似乎无法让它发挥作用！这是我目前所拥有的..

test.text:

this is a line of text a
this is line 2b
line 3
line 4
line etc.etc.etc.

Test.java:

import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.charset.Charset;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;

public class Test {
    public static final Charset ENCODING = Charset.forName("UTF-8");
    public static final byte[] NEWLINE_BYTE = {0x0A, 0x0D};

    public Test() {

        String pathString = "test.txt";

        //the path to the file
        Path path = Paths.get(pathString);

        try (FileChannel fc = FileChannel.open(path, 
                StandardOpenOption.READ, StandardOpenOption.WRITE, StandardOpenOption.CREATE)) {            
            if (fc.size() > 0) {
                int n;
                ByteBuffer buffer = ByteBuffer.allocate((int) fc.size());
                do {                    
                    n = fc.read(buffer);
                } while (n != -1 && buffer.hasRemaining());
                buffer.flip();
                int pos = 0;
                System.out.println("FILE LOADED: |" + new String(buffer.array(), ENCODING) + "|");
                do {
                    byte b = buffer.get();
                    if (b == NEWLINE_BYTE[0] || b == NEWLINE_BYTE[1]) {
                        System.out.println("POS: " + pos);
                        System.out.println("POSITION: " + buffer.position());
                        System.out.println("LENGTH: " + Integer.toString(buffer.position() - pos));
                        ByteBuffer lineBuffer = ByteBuffer.wrap(buffer.array(), pos + 1, buffer.position() - pos);
                        System.out.println("LINE: |" + new String(lineBuffer.array(), ENCODING) + "|");
                        pos = buffer.position();
                    }
                } while (buffer.hasRemaining());
            } 
        } catch (IOException ioe) {
           ioe.printStackTrace();
        }
    }
    public static void main(String args[]) {
        Test t = new Test();
    }
}

所以第一部分工作正常，fc.read(buffer) 函数只运行一次并将整个文件拉入 ByteBuffer。然后在第二个 do 循环中，我能够很好地逐字节循环，当它遇到 \n（或 \r）时它确实命中了 if 语句，但后来我不知道如何得到它我刚刚查看的部分字节放入一个单独的字节数组中进行处理！我已经尝试过拼接和各种翻转，并且我已经尝试过如上面的代码所示的包装，但似乎无法使其工作，两个缓冲区总是有完整的文件，我拼接或包装的任何东西也是如此！

我只需要逐字节循环文件，一次查看某个部分，然后是我的最终目标，当我查看并找到正确的位置时，我想插入一些数据到正确的位置！我需要在 "LINE: " 处输出的 lineBuffer 只包含到目前为止我循环过的字节部分！帮忙谢谢！

Answer 1

撇开 I/O，一旦您在 ByteBuffer 中有了内容，通过 asCharBuffer() 将其转换为 CharBuffer 会简单得多。然后 CharBuffer 实现 CharSequence，它为您提供了很多 String 和正则表达式方法供您使用。

Answer 2

这是我最终得到的解决方案，每次使用 ByteBuffer 的 bulk relative get 函数来获取块。我想我正在按预期使用 mark() 功能，尽管我使用了一个附加变量 (pos) 来跟踪标记，因为我在 ByteBuffer 中找不到 return 的相对位置的函数商标本身。此外，我还具有按顺序查找 \r、\n 或两者的显式功能。请记住，此代码仅适用于 UTF-8 编码数据。我希望这对其他人有帮助。

public class Test {
    public static final Charset ENCODING = Charset.forName("UTF-8");
    public static final byte[] NEWLINE_BYTES = {0x0A, 0x0D};

    public Test() {
        //test text file sequence of any strings followed by newline
        String pathString = "test.txt";
        Path path = Paths.get(pathString);

        try (FileChannel fc = FileChannel.open(path, 
                StandardOpenOption.READ, StandardOpenOption.WRITE, StandardOpenOption.CREATE)) {

            if (fc.size() > 0) {
                int n;
                ByteBuffer buffer = ByteBuffer.allocate((int) fc.size());
                do {                    
                    n = fc.read(buffer);
                } while (n != -1 && buffer.hasRemaining());
                buffer.flip();
                int newlineByteCount = 0;
                buffer.mark();
                do {
                    //get one byte at a time
                    byte b = buffer.get();

                    if (b == NEWLINE_BYTES[0] || b == NEWLINE_BYTES[1]) {
                        newlineByteCount++;

                        byte nextByte = buffer.get();
                        if (nextByte == NEWLINE_BYTES[1]) {
                            newlineByteCount++;
                        } else {
                            buffer.position(buffer.position() - 1);
                        }

                        int pos = buffer.position();
                        //reset the buffer back to the mark() position
                        buffer.reset();
                        //create an array just the right length and get the bytes we just measured out 
                        int length = pos - buffer.position() - newlineByteCount;
                        byte[] lineBytes = new byte[length];
                        buffer.get(lineBytes, 0, length);

                        String lineString = new String(lineBytes, ENCODING);
                        System.out.println("LINE: " + lineString);

                        buffer.position(buffer.position() + newlineByteCount);

                        buffer.mark();
                        newlineByteCount = 0;
                    } else if (newlineByteCount > 0) {

                    }
                } while (buffer.hasRemaining());
            } 
        } catch (IOException ioe) { ioe.printStackTrace(); }
    }
    public static void main(String args[]) { new Test(); }
}

Answer 3

我需要类似但比拆分单个缓冲区更通用的东西。就我而言，我有多个缓冲区；事实上，我的代码是 Spring StringDecoder that can convert a Flux<DataBuffer>(DataBuffer) 到 Flux<String>.

的修改

Java NIO 通过 ByteBuffer 扫描某些字节和带有部分的字

Java NIO scan through ByteBuffer for certain bytes and word with sections

java

nio

bytebuffer

filechannel