从包含映射(键值对)的复杂 JSON 创建 Avro 模式
Create Avro Schema from complex JSON containing map(key-value pair)
我有一个 JSON 文档,想为数据序列化和反序列化创建一个 Avro 模式。
我已经编写了下面定义的 JSON 文档的 Avro 架构,但是当我根据架构序列化 JSON 数据时,SchemaParser 抛出异常。虽然我阅读了很多关于 Avro 及其数据类型的内容,但无法解决这个问题。
对于这个问题,我指定了 JSON 文档、Avro 模式和 SchemaParser 抛出的异常。
1) JSON 文档
{
"category": "test",
"values": [
{
"subscriberid": 87392,
"simserialnumber": 923,
"MCC": 33,
"MNC": [
{
"mn": {"key1":"kunal","key2":"gupta"},
"mc": 44
}
],
"countryiso": "IN",
"operatorname": "vodadone"
}
]
}
2) Avro 架构
{
"type": "record",
"namespace": "testavro.schema",
"name": "test",
"fields": [
{
"type": "string",
"name": "data_version"
},
{
"type": "string",
"name": "ip_address"
},
{
"type": "string",
"name": "category"
},
{
"type": {
"items": {
"fields": [
{
"type": "int",
"name": "simserialnumber"
},
{
"type": "string",
"name": "countryiso"
},
{
"type": "int",
"name": "MCC"
},
{
"type": "int",
"name": "subscriberid"
},
{
"type": {
"items": {
"fields": [
{
"fields": [
{
"type": "string",
"name": "key2"
},
{
"type": "string",
"name": "key1"
}
],
"type": "record",
"name": "mn"
},
{
"type": "int",
"name": "mc"
}
],
"type": "record",
"name": "MNC_records"
},
"type": "array"
},
"name": "MNC"
},
{
"type": "string",
"name": "operatorname"
}
],
"type": "record",
"name": "values_records"
},
"type": "array"
},
"name": "values"
}
]
}
3) SchemaParserException
SchemaParseException: Type property "{u'items': {u'fields': [{u'type': u'int', u'name': u'simserialnumber'}, {u'type': u'string', u'name': u'countryiso'}, {u'type': u'int', u'name': u'MCC'}, {u'type': u'int', u'name': u'subscriberid'}, {u'type': {u'items': {u'fields': [{u'fields': [{u'type': u'string', u'name': u'key2'}, {u'type': u'string', u'name': u'key1'}], u'type': u'record', u'name': u'mn'}, {u'type': u'int', u'name': u'mc'}], u'type': u'record', u'name': u'MNC_records'}, u'type': u'array'}, u'name': u'MNC'}, {u'type': u'string', u'name': u'operatorname'}], u'type': u'record', u'name': u'values_records'}, u'type': u'array'}" not a valid Avro schema: Items schema ({u'fields': [{u'type': u'int', u'name': u'simserialnumber'}, {u'type': u'string', u'name': u'countryiso'}, {u'type': u'int', u'name': u'MCC'}, {u'type': u'int', u'name': u'subscriberid'}, {u'type': {u'items': {u'fields': [{u'fields': [{u'type': u'string', u'name': u'key2'}, {u'type': u'string', u'name': u'key1'}], u'type': u'record', u'name': u'mn'}, {u'type': u'int', u'name': u'mc'}], u'type': u'record', u'name': u'MNC_records'}, u'type': u'array'}, u'name': u'MNC'}, {u'type': u'string', u'name': u'operatorname'}], u'type': u'record', u'name': u'values_records'}) not a valid Avro schema: Type property "{u'items': {u'fields': [{u'fields': [{u'type': u'string', u'name': u'key2'}, {u'type': u'string', u'name': u'key1'}], u'type': u'record', u'name': u'mn'}, {u'type': u'int', u'name': u'mc'}], u'type': u'record', u'name': u'MNC_records'}, u'type': u'array'}" not a valid Avro schema: Items schema ({u'fields': [{u'fields': [{u'type': u'string', u'name': u'key2'}, {u'type': u'string', u'name': u'key1'}], u'type': u'record', u'name': u'mn'}, {u'type': u'int', u'name': u'mc'}], u'type': u'record', u'name': u'MNC_records'}) not a valid Avro schema: Type property "record" not a valid Avro schema: Could not make an Avro Schema object from record. (known names: [u'testavro.schema.MNC_records', u'testavro.schema.test', u'testavro.schema.values_records']) (known names: [u'testavro.schema.MNC_records', u'testavro.schema.test', u'testavro.schema.values_records'])
请帮助我,如果我能克服这个问题就好了。我在这个 JSON 和 Avro 架构上花了一整天,但无法成功。
您的 values_record
中 MNC
字段的 items
类型似乎有错字。将 mn
的定义包装在一个新对象中有效:
{
"type": "record",
"namespace": "testavro.schema",
"name": "test",
"fields": [
{
"type": "string",
"name": "data_version"
},
{
"type": "string",
"name": "ip_address"
},
{
"type": "string",
"name": "category"
},
{
"type": {
"items": {
"fields": [
{
"type": "int",
"name": "simserialnumber"
},
{
"type": "string",
"name": "countryiso"
},
{
"type": "int",
"name": "MCC"
},
{
"type": "int",
"name": "subscriberid"
},
{
"type": {
"items": {
"fields": [
{
"type": {
"type": "record",
"fields": [
{
"type": "string",
"name": "key2"
},
{
"type": "string",
"name": "key1"
}
],
"name": "Mn"
},
"name": "mn"
},
{
"type": "int",
"name": "mc"
}
],
"type": "record",
"name": "MNC_records"
},
"type": "array"
},
"name": "MNC"
},
{
"type": "string",
"name": "operatorname"
}
],
"type": "record",
"name": "values_records"
},
"type": "array"
},
"name": "values"
}
]
}
我有一个 JSON 文档,想为数据序列化和反序列化创建一个 Avro 模式。
我已经编写了下面定义的 JSON 文档的 Avro 架构,但是当我根据架构序列化 JSON 数据时,SchemaParser 抛出异常。虽然我阅读了很多关于 Avro 及其数据类型的内容,但无法解决这个问题。
对于这个问题,我指定了 JSON 文档、Avro 模式和 SchemaParser 抛出的异常。
1) JSON 文档
{
"category": "test",
"values": [
{
"subscriberid": 87392,
"simserialnumber": 923,
"MCC": 33,
"MNC": [
{
"mn": {"key1":"kunal","key2":"gupta"},
"mc": 44
}
],
"countryiso": "IN",
"operatorname": "vodadone"
}
]
}
2) Avro 架构
{
"type": "record",
"namespace": "testavro.schema",
"name": "test",
"fields": [
{
"type": "string",
"name": "data_version"
},
{
"type": "string",
"name": "ip_address"
},
{
"type": "string",
"name": "category"
},
{
"type": {
"items": {
"fields": [
{
"type": "int",
"name": "simserialnumber"
},
{
"type": "string",
"name": "countryiso"
},
{
"type": "int",
"name": "MCC"
},
{
"type": "int",
"name": "subscriberid"
},
{
"type": {
"items": {
"fields": [
{
"fields": [
{
"type": "string",
"name": "key2"
},
{
"type": "string",
"name": "key1"
}
],
"type": "record",
"name": "mn"
},
{
"type": "int",
"name": "mc"
}
],
"type": "record",
"name": "MNC_records"
},
"type": "array"
},
"name": "MNC"
},
{
"type": "string",
"name": "operatorname"
}
],
"type": "record",
"name": "values_records"
},
"type": "array"
},
"name": "values"
}
]
}
3) SchemaParserException
SchemaParseException: Type property "{u'items': {u'fields': [{u'type': u'int', u'name': u'simserialnumber'}, {u'type': u'string', u'name': u'countryiso'}, {u'type': u'int', u'name': u'MCC'}, {u'type': u'int', u'name': u'subscriberid'}, {u'type': {u'items': {u'fields': [{u'fields': [{u'type': u'string', u'name': u'key2'}, {u'type': u'string', u'name': u'key1'}], u'type': u'record', u'name': u'mn'}, {u'type': u'int', u'name': u'mc'}], u'type': u'record', u'name': u'MNC_records'}, u'type': u'array'}, u'name': u'MNC'}, {u'type': u'string', u'name': u'operatorname'}], u'type': u'record', u'name': u'values_records'}, u'type': u'array'}" not a valid Avro schema: Items schema ({u'fields': [{u'type': u'int', u'name': u'simserialnumber'}, {u'type': u'string', u'name': u'countryiso'}, {u'type': u'int', u'name': u'MCC'}, {u'type': u'int', u'name': u'subscriberid'}, {u'type': {u'items': {u'fields': [{u'fields': [{u'type': u'string', u'name': u'key2'}, {u'type': u'string', u'name': u'key1'}], u'type': u'record', u'name': u'mn'}, {u'type': u'int', u'name': u'mc'}], u'type': u'record', u'name': u'MNC_records'}, u'type': u'array'}, u'name': u'MNC'}, {u'type': u'string', u'name': u'operatorname'}], u'type': u'record', u'name': u'values_records'}) not a valid Avro schema: Type property "{u'items': {u'fields': [{u'fields': [{u'type': u'string', u'name': u'key2'}, {u'type': u'string', u'name': u'key1'}], u'type': u'record', u'name': u'mn'}, {u'type': u'int', u'name': u'mc'}], u'type': u'record', u'name': u'MNC_records'}, u'type': u'array'}" not a valid Avro schema: Items schema ({u'fields': [{u'fields': [{u'type': u'string', u'name': u'key2'}, {u'type': u'string', u'name': u'key1'}], u'type': u'record', u'name': u'mn'}, {u'type': u'int', u'name': u'mc'}], u'type': u'record', u'name': u'MNC_records'}) not a valid Avro schema: Type property "record" not a valid Avro schema: Could not make an Avro Schema object from record. (known names: [u'testavro.schema.MNC_records', u'testavro.schema.test', u'testavro.schema.values_records']) (known names: [u'testavro.schema.MNC_records', u'testavro.schema.test', u'testavro.schema.values_records'])
请帮助我,如果我能克服这个问题就好了。我在这个 JSON 和 Avro 架构上花了一整天,但无法成功。
您的 values_record
中 MNC
字段的 items
类型似乎有错字。将 mn
的定义包装在一个新对象中有效:
{
"type": "record",
"namespace": "testavro.schema",
"name": "test",
"fields": [
{
"type": "string",
"name": "data_version"
},
{
"type": "string",
"name": "ip_address"
},
{
"type": "string",
"name": "category"
},
{
"type": {
"items": {
"fields": [
{
"type": "int",
"name": "simserialnumber"
},
{
"type": "string",
"name": "countryiso"
},
{
"type": "int",
"name": "MCC"
},
{
"type": "int",
"name": "subscriberid"
},
{
"type": {
"items": {
"fields": [
{
"type": {
"type": "record",
"fields": [
{
"type": "string",
"name": "key2"
},
{
"type": "string",
"name": "key1"
}
],
"name": "Mn"
},
"name": "mn"
},
{
"type": "int",
"name": "mc"
}
],
"type": "record",
"name": "MNC_records"
},
"type": "array"
},
"name": "MNC"
},
{
"type": "string",
"name": "operatorname"
}
],
"type": "record",
"name": "values_records"
},
"type": "array"
},
"name": "values"
}
]
}