如何引用数组的一部分？

Question

给定一个对象byte[]，当我们想要操作这样的对象时，我们经常需要它的片段。在我的特定示例中，我从 wire 获得 byte[]，其中前 4 个字节描述消息的长度，然后另外 4 个字节描述消息的类型（映射到具体 protobuf class 的整数）然后剩余 byte[] 是消息的实际内容...像这样

length|type|content

为了解析此消息，我必须将内容部分传递给特定的 class，它知道如何从中解析实例...问题是通常没有提供任何方法，因此您可以指定解析器应从何处读取数组...

所以我们最终做的是复制该数组的剩余 chuks，这是无效的...

据我所知，在 java 中不可能创建另一个 byte[] 引用，它实际上引用了一些原始的更大的 byte[] 数组，只有 2 个索引（这是方法导致内存泄漏的字符串)...

我想知道我们如何解决这种情况？我想放弃 protobuf 只是因为它不提供一些 parseFrom(byte[], int, int) 没有意义... protobuf 只是一个例子，任何东西都可能缺少 api...

这会迫使我们编写低效的代码还是有什么可以做的？（除了添加该方法）...

Answer 1

在 Java 中，数组不仅仅是内存的一部分 - 它是一个对象，具有一些额外的字段（至少 - 长度）。所以你不能 link 到数组的一部分 - 你应该：

使用数组复制函数或
实现并使用一些仅使用部分字节数组的算法。

Answer 2

问题似乎是无法在数组上创建视图（例如，相当于 List#subList() 的数组）。解决方法可能是让您的解析方法采用对整个数组和两个索引（或索引和长度）的引用来指定该方法应该处理的子数组。

这不会阻止方法读取或修改它们不应接触的数组部分。如果这是一个问题，也许 ByteArrayView class 可以增加一点安全性：

public class ByteArrayView {
  private final byte[] array;
  private final int start;
  private final int length;

  public ByteArrayView(byte[] array, int start, int length) { ... }

  public byte[] get(int index) {
    if (index < 0 || index >= length) {
      throw new ArrayOutOfBoundsExceptionOrSomeOtherRelevantException();
    }
    return array[start + index];
  }
}

但是，另一方面，如果性能是一个问题，那么调用 get() 来获取每个字节的方法可能是不可取的。

代码仅供参考；它没有经过测试或任何东西。

编辑

第二次阅读我自己的回答时，我意识到我应该指出这一点：使用 ByteArrayView 将复制您从原始数组中读取的每个字节——只是逐字节而不是作为块。这不足以解决 OP 的担忧。

Answer 3

通常你会用流处理这种事情。

流是一种抽象，用于读取处理当前数据块所需的内容。因此，您可以将正确数量的字节读入字节数组并将其传递给您的解析函数。

你问'So does this force us to write inefficient code or there is something that can be done?'

通常您以流的形式获取数据，然后使用下面演示的技术会提高性能，因为您无需制作一份副本。（两个副本而不是三个；一次由 OS 和一次由您。在开始解析之前，您跳过制作总字节数组的副本。）如果您实际上从 byte[] 开始，但它由您自己构造，那么您可能想改为构造一个对象，例如 { int length, int type, byte[] contentBytes } 并将 contentBytes 传递给您的解析函数。

如果你真的，真的必须从 byte[] 开始，那么下面的技术只是一种更方便的解析方法，它不会更高效。

假设您从某处获得了一个字节缓冲区，并且您想要读取该缓冲区的内容。首先将其转换为流：

private static List<Content> read(byte[] buffer) {
    try {
        ByteArrayInputStream bytesStream = new ByteArrayInputStream(buffer);
        return read(bytesStream);
    } catch (IOException e) {
        e.printStackTrace();
    }
}

上述函数用流包装字节数组并将其传递给执行实际读取的函数。如果你可以从一个流开始，那么显然你可以跳过上面的步骤，直接将该流传递给下面的函数：

private static List<Content> read(InputStream bytesStream) throws IOException {
    List<Content> results = new ArrayList<Content>();
    try {
        // read the content...
        Content content1 = readContent(bytesStream);
        results.add(content1);

        // I don't know if there's more than one content block but assuming
        // that there is, you can just continue reading the stream...
        //
        // If it's a fixed number of content blocks then just read them one
        // after the other... Otherwise make this a loop
        Content content2 = readContent(bytesStream);
        results.add(content2);
    } finally {
        bytesStream.close();
    }
    return results;
}

由于您的字节数组包含您需要从流中读取内容块的内容。由于您有一个长度字段和一个类型字段，我假设您有不同类型的内容块。下一个函数读取长度和类型，并根据读取类型将内容字节的处理传递给适当的 class：

private static Content readContent(InputStream stream) throws IOException {
    final int CONTENT_TYPE_A = 10;
    final int CONTENT_TYPE_B = 11;

    // wrap the InputStream in a DataInputStream because the latter has
    // convenience functions to convert bytes to integers, etc.
    // Note that DataInputStream handles the stream in a BigEndian way,
    // so check that your bytes are in the same byte order. If not you'll
    // have to find another stream reader that can convert to ints from
    // LittleEndian byte order.
    DataInputStream data = new DataInputStream(stream);
    int length = data.readInt();
    int type = data.readInt();

    // I'm assuming that above length field was the number of bytes for the
    // content. So, read length number of bytes into a buffer and pass that 
    // to your `parseFrom(byte[])` function 
    byte[] contentBytes = new byte[length];
    int readCount = data.read(contentBytes, 0, contentBytes.length);
    if (readCount < contentBytes.length)
        throw new IOException("Unexpected end of stream");

    switch (type) {
        case CONTENT_TYPE_A:
            return ContentTypeA.parseFrom(contentBytes);
        case CONTENT_TYPE_B:
            return ContentTypeB.parseFrom(contentBytes);
        default:
            throw new UnsupportedOperationException();
    }
}

我编造了以下内容classes。我不知道 protobuf 是什么，但它显然可以使用其 parseFrom(byte[]) 函数从字节数组转换为实际对象，因此将其视为伪代码：

class Content {
    // common functionality
}

class ContentTypeA extends Content {
    public static ContentTypeA parseFrom(byte[] contentBytes) {
        return null; // do the actual parsing of a type A content 
    }
}

class ContentTypeB extends Content {
    public static ContentTypeB parseFrom(byte[] contentBytes) {
        return null; // do the actual parsing of a type B content
    }
}

如何引用数组的一部分？

how to refer part of an array?

java

arrays

protocol-buffers

zero-copy