Avro 不为数组类型填充方括号
Avro not populating square brackets for Array type
我有以下 Avro 模式:
{
"name": "schema_name",
"type": "record",
"fields" : [
{
"name": "schema",
"type": "string"
},
{
"name": "data",
"type": {
"type": "array",
"items":
{
"name": "data",
"type": "record",
"fields": [
{
"name": "phone_number",
"type": "string"
}
]
}
}
},
{
"name":"flag",
"type":"string"
}
]
}
我正在使用它从文本文件生成 Avro 消息:
def main(args: Array[String]): Unit = {
val avroSchemaStr = Source.fromFile("avro_schema.txt").mkString
val avroSchema = new Schema.Parser().parse(avroSchemaStr)
Source.fromFile("phone_numbers.txt").getLines.foreach { msg =>
println(fixedWidthToAvro(msg, avroSchema))
}
}
def fixedWidthToAvro(record: String, avroSchema: Schema): GenericRecord = {
val childSchema = new GenericData.Record(avroSchema).getSchema.getField("data").schema.getElementType
val parentRrecord = new GenericData.Record(avroSchema)
val childRecord = new GenericData.Record(childSchema)
childRecord.put("phone_number", "1234567890")
parentRrecord.put("schema", "schema_name")
parentRrecord.put("data", childRecord)
parentRrecord.put("flag", "I")
println(parentRrecord)
parentRrecord
}
一切正常,我得到了给定消息的以下输出:
{"schema": "schema_name", "data": {"phone_number": "1234567890"}, "flag": "I"}
但是,当我将 data
字段类型声明为 array
时,我希望它像集合一样用方括号括起来。类似于:
{"schema": "schema_name", "data": [{"phone_number": "1234567890"}], "flag": "I"}
我希望 data
字段用方括号括起来。我怎样才能做到这一点?
记录中有两个元素名为 data
。一个是数组,另一个是数组里面的元素名,我想这就是你困惑的地方。
当您将 schema.getElementType
传递给 Record
时,您生成了一条记录,而忽略了创建一个 Array[Record]
来保存所有这些记录。
您需要的是一个可以容纳您所有记录的数组:
val avroSchema = new Schema.Parser().parse(schema)
val childSchema = new GenericData.Record(avroSchema).getSchema.getField("data").schema
val parentRecord = new GenericData.Record(avroSchema)
val childRecords = new GenericData.Array[GenericData.Record](1024, childSchema)
val childRecord = new GenericData.Record(childSchema.getElementType)
childRecord.put("phone_number", "33333")
childRecords.add(childRecord)
parentRecord.put("schema", "schema_name")
parentRecord.put("data", childRecords)
parentRecord.put("flag", "I")
println(parentRecord)
产量:
{"schema": "schema_name", "data": [{"phone_number": "33333"}], "flag": "I"}
我有以下 Avro 模式:
{
"name": "schema_name",
"type": "record",
"fields" : [
{
"name": "schema",
"type": "string"
},
{
"name": "data",
"type": {
"type": "array",
"items":
{
"name": "data",
"type": "record",
"fields": [
{
"name": "phone_number",
"type": "string"
}
]
}
}
},
{
"name":"flag",
"type":"string"
}
]
}
我正在使用它从文本文件生成 Avro 消息:
def main(args: Array[String]): Unit = {
val avroSchemaStr = Source.fromFile("avro_schema.txt").mkString
val avroSchema = new Schema.Parser().parse(avroSchemaStr)
Source.fromFile("phone_numbers.txt").getLines.foreach { msg =>
println(fixedWidthToAvro(msg, avroSchema))
}
}
def fixedWidthToAvro(record: String, avroSchema: Schema): GenericRecord = {
val childSchema = new GenericData.Record(avroSchema).getSchema.getField("data").schema.getElementType
val parentRrecord = new GenericData.Record(avroSchema)
val childRecord = new GenericData.Record(childSchema)
childRecord.put("phone_number", "1234567890")
parentRrecord.put("schema", "schema_name")
parentRrecord.put("data", childRecord)
parentRrecord.put("flag", "I")
println(parentRrecord)
parentRrecord
}
一切正常,我得到了给定消息的以下输出:
{"schema": "schema_name", "data": {"phone_number": "1234567890"}, "flag": "I"}
但是,当我将 data
字段类型声明为 array
时,我希望它像集合一样用方括号括起来。类似于:
{"schema": "schema_name", "data": [{"phone_number": "1234567890"}], "flag": "I"}
我希望 data
字段用方括号括起来。我怎样才能做到这一点?
记录中有两个元素名为 data
。一个是数组,另一个是数组里面的元素名,我想这就是你困惑的地方。
当您将 schema.getElementType
传递给 Record
时,您生成了一条记录,而忽略了创建一个 Array[Record]
来保存所有这些记录。
您需要的是一个可以容纳您所有记录的数组:
val avroSchema = new Schema.Parser().parse(schema)
val childSchema = new GenericData.Record(avroSchema).getSchema.getField("data").schema
val parentRecord = new GenericData.Record(avroSchema)
val childRecords = new GenericData.Array[GenericData.Record](1024, childSchema)
val childRecord = new GenericData.Record(childSchema.getElementType)
childRecord.put("phone_number", "33333")
childRecords.add(childRecord)
parentRecord.put("schema", "schema_name")
parentRecord.put("data", childRecords)
parentRecord.put("flag", "I")
println(parentRecord)
产量:
{"schema": "schema_name", "data": [{"phone_number": "33333"}], "flag": "I"}