从 scala 读取 Avro 文件
Reading an Avro file from scala
我正在尝试使用 scala 读取 avro 文件。
我已经使用 avro-tools 提取了文件的架构并将其保存到文件中,然后尝试使用以下代码读取它:
val zibi= scala.io.Source.fromFile("/home/wasabi/schema").mkString
val schema_obj = new Schema.Parser
val schema2 = schema_obj.parse(zibi)
val READER2 = new GenericDatumReader[GenericRecord](schema2)
val myFile = Files.readAllBytes(Paths.get("/tmp/check/CMRF_80_1442744555901-1_1_2_1_1_1_4_10_1.avro"))
val datum = READER2.read(null, DecoderFactory.defaultFactory.createBinaryDecoder(myFile,null))
但我一直这样遇到 IOExceptions:
java.io.IOException: Invalid int encoding
at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:145)
at org.apache.avro.io.ValidatingDecoder.readInt(ValidatingDecoder.java:83)
at org.apache.avro.generic.GenericDatumReader.readInt(GenericDatumReader.java:444)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:159)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
at org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:219)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
当我通过 avro-tools 读取文件时,它读取起来很好。
我做错了什么?
尝试使用 DataFileReader
而不是 BinaryDecoder
。
虽然 Encoder/Decoders 用于写入和读取原始 avros,但我怀疑它们对在 avro 数据文件中找到的 header 信息感到窒息。
import org.apache.avro.generic.{ GenericDatumReader, GenericRecord }
import org.apache.avro.file.DataFileReader
val zibi= scala.io.Source.fromFile("/home/wasabi/schema").mkString
val schema_obj = new Schema.Parser
val schema2 = schema_obj.parse(zibi)
val READER2 = new GenericDatumReader[GenericRecord](schema2)
val myFile = new File("/tmp/check/CMRF_80_1442744555901-1_1_2_1_1_1_4_10_1.avro")
val dataFileReader = new DataFileReader[GenericRecord](myFile, READER2)
val datum = dataFileReader.next()
我正在尝试使用 scala 读取 avro 文件。
我已经使用 avro-tools 提取了文件的架构并将其保存到文件中,然后尝试使用以下代码读取它:
val zibi= scala.io.Source.fromFile("/home/wasabi/schema").mkString
val schema_obj = new Schema.Parser
val schema2 = schema_obj.parse(zibi)
val READER2 = new GenericDatumReader[GenericRecord](schema2)
val myFile = Files.readAllBytes(Paths.get("/tmp/check/CMRF_80_1442744555901-1_1_2_1_1_1_4_10_1.avro"))
val datum = READER2.read(null, DecoderFactory.defaultFactory.createBinaryDecoder(myFile,null))
但我一直这样遇到 IOExceptions:
java.io.IOException: Invalid int encoding
at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:145)
at org.apache.avro.io.ValidatingDecoder.readInt(ValidatingDecoder.java:83)
at org.apache.avro.generic.GenericDatumReader.readInt(GenericDatumReader.java:444)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:159)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
at org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:219)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
当我通过 avro-tools 读取文件时,它读取起来很好。
我做错了什么?
尝试使用 DataFileReader
而不是 BinaryDecoder
。
虽然 Encoder/Decoders 用于写入和读取原始 avros,但我怀疑它们对在 avro 数据文件中找到的 header 信息感到窒息。
import org.apache.avro.generic.{ GenericDatumReader, GenericRecord }
import org.apache.avro.file.DataFileReader
val zibi= scala.io.Source.fromFile("/home/wasabi/schema").mkString
val schema_obj = new Schema.Parser
val schema2 = schema_obj.parse(zibi)
val READER2 = new GenericDatumReader[GenericRecord](schema2)
val myFile = new File("/tmp/check/CMRF_80_1442744555901-1_1_2_1_1_1_4_10_1.avro")
val dataFileReader = new DataFileReader[GenericRecord](myFile, READER2)
val datum = dataFileReader.next()