为什么我的 `InputStream` 实现不能与来自 `com.monitorjbl:xlsx-streamer:2.2.0` 的 `StreamingReader` 一起工作

Why my implementation of `InputStream` not working with `StreamingReader` from `com.monitorjbl:xlsx-streamer:2.2.0`

由于缺少图书馆 GitHub 站点的 activity,我决定在这里放弃这个问题,希望得到任何支持。

我正在处理的问题是以流方式读取 Excel 文件。特别地,Excel 文件在使用特定块大小拆分为多行后,作为 blob 存储在 SQLite 数据库中。例如,一个 3MB 的文件被分成三行,每行包含 1MB 的原始数据。行是 属性 有序的,所以如果我按顺序将每一行的 blob 列输出到文件系统,我可以获得 Excel 文件的副本。

由于 StreamingReaderInputStream 一起工作,我决定在 SQLite 数据库中的这些行之上实现一个 InputStream,以便 StreamingReader 直接从分贝

我首先在查询结果之上构建一个 Sequence<Byte>,对所有 blob 列中的字节进行排序:

    fun blocksByteSequence(id: String): Sequence<Byte> {
        return sequence {
            val conn = source.connection
            val stmt = conn.createStatement()
            val r = stmt.executeQuery(findFileQuery(id))
            while (r.next()) yieldAll(r.getBytes(raw_data_column).asIterable())
            stmt.close()
            conn.close()
        }
    }

然后把Sequence<Byte>变成InputStream就相当简单了:

class ByteSequenceInputStreamFactory(
    private val seq: Sequence<Byte>,
) {
    fun inputStreamProvider(): InputStream = object : InputStream() {
        private val iter = seq.iterator()
        override fun read(): Int {
            return if (iter.hasNext()) iter.next().toInt() else -1
        }
    }
}

当我尝试使用 InputStream:

构造 StreamingReader 时出现错误
val byteSeq = blocksByteSequence(id)
val ins = ByteSequenceInputStreamFactory(byteSeq).inputStreamProvider()
val reader = StreamingReader.builder().open(ins) // error

错误信息:

Could not open the specified zip entry source stream
org.apache.poi.openxml4j.exceptions.InvalidOperationException: Could not open the specified zip entry source stream
    at app//org.apache.poi.openxml4j.opc.ZipPackage.openZipEntrySourceStream(ZipPackage.java:212)
    at app//org.apache.poi.openxml4j.opc.ZipPackage.openZipEntrySourceStream(ZipPackage.java:194)
    ...
Caused by: java.util.zip.ZipException: invalid distances set
    at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.readFromInflater(ZipArchiveInputStream.java:586)
    at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.readDeflated(ZipArchiveInputStream.java:551)
   ...
Caused by: java.util.zip.DataFormatException: invalid distances set
    at java.base/java.util.zip.Inflater.inflateBytesBytes(Native Method)
    at java.base/java.util.zip.Inflater.inflate(Inflater.java:378)
   ...

但是,如果我将 SQLite 中的所有字节转储到某个路径的 Excel 文件中:

val byteSeq = manager.blocksByteSequence(id)
val out = java.nio.file.Path.of("./private/test.xlsx")
out.outputStream().use { o -> byteSeq.forEach {  o.write(it.toInt()) } }

并使用该文件生成的InputStream,错误消失了。

val reader = StreamingReader.builder().open(out.inputStream())

我想我解决了问题。

麻烦就在这里

class ByteSequenceInputStreamFactory(
    private val seq: Sequence<Byte>,
) {
    fun inputStreamProvider(): InputStream = object : InputStream() {
        private val iter = seq.iterator()
        override fun read(): Int {
            return if (iter.hasNext()) iter.next().toInt() /* this is not OK */  else -1
        }
    }
}

调用方法 Byte.intoInt() 没有达到 InputStream 预期的结果。

根据 Java 文档,方法 InputStream.read()

Reads the next byte of data from the input stream. The value byte is returned as an int in the range 0 to 255. If no byte is available because the end of the stream has been reached, the value -1 is returned. This method blocks until input data is available, the end of the stream is detected, or an exception is thrown.

棘手的部分是,从 Byte.toInt() 编辑的 Int return 不是 0 到 255 范围内的 int。

在科特林中,Byte:

Represents a 8-bit signed integer. On the JVM, non-nullable values of this type are represented as values of the primitive type byte.

Byte.toInt()方法:

Converts this Byte value to Int. The resulting Int value represents the same numerical value as this Byte. The least significant 8 bits of the resulting Int value are the same as the bits of this Byte value, whereas the most significant 24 bits are filled with the sign bit of this value.

只需调用 Byte.toInt() 即可 return 有符号 整数 Byte。要获得它的 0-255 表示,我需要通过这样做来提取租赁的 8 位有效位:

val the_0_255_int = someByte.toInt().and(0xff) // extract the last 8 bits

所以我的问题的正确代码如下所示:

class ByteSequenceInputStreamFactory(
    private val seq: Sequence<Byte>,
) {
    fun inputStreamProvider(): InputStream = object : InputStream() {
        private val iter = seq.iterator()
        override fun read(): Int {
            return if (iter.hasNext()) iter.next().toInt().and(0xff)  else -1
        }
    }
}