Avro 模式不支持向后兼容性
Avro schema doesn't honor backward compatibilty
我有这个 avro 模式
{
"namespace": "xx.xxxx.xxxxx.xxxxx",
"type": "record",
"name": "MyPayLoad",
"fields": [
{"name": "filed1", "type": "string"},
{"name": "filed2", "type": "long"},
{"name": "filed3", "type": "boolean"},
{
"name" : "metrics",
"type":
{
"type" : "array",
"items":
{
"name": "MyRecord",
"type": "record",
"fields" :
[
{"name": "min", "type": "long"},
{"name": "max", "type": "long"},
{"name": "sum", "type": "long"},
{"name": "count", "type": "long"}
]
}
}
}
]
}
这是我们用来解析数据的代码
public static final MyPayLoad parseBinaryPayload(byte[] payload) {
DatumReader<MyPayLoad> payloadReader = new SpecificDatumReader<>(MyPayLoad.class);
Decoder decoder = DecoderFactory.get().binaryDecoder(payload, null);
MyPayLoad myPayLoad = null;
try {
myPayLoad = payloadReader.read(null, decoder);
} catch (IOException e) {
logger.log(Level.SEVERE, e.getMessage(), e);
}
return myPayLoad;
}
现在我想在架构中再添加一个字段,因此架构如下所示
{
"namespace": "xx.xxxx.xxxxx.xxxxx",
"type": "record",
"name": "MyPayLoad",
"fields": [
{"name": "filed1", "type": "string"},
{"name": "filed2", "type": "long"},
{"name": "filed3", "type": "boolean"},
{
"name" : "metrics",
"type":
{
"type" : "array",
"items":
{
"name": "MyRecord",
"type": "record",
"fields" :
[
{"name": "min", "type": "long"},
{"name": "max", "type": "long"},
{"name": "sum", "type": "long"},
{"name": "count", "type": "long"}
]
}
}
}
{"name": "agentType", "type": ["null", "string"], "default": "APP_AGENT"}
]
}
请注意添加的字段,并且还定义了默认值。问题是,如果我们收到使用旧模式写入的数据,我会收到此错误
java.io.EOFException: null
at org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473) ~[avro-1.7.4.jar:1.7.4]
at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128) ~[avro-1.7.4.jar:1.7.4]
at org.apache.avro.io.BinaryDecoder.readIndex(BinaryDecoder.java:423) ~[avro-1.7.4.jar:1.7.4]
at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) ~[avro-1.7.4.jar:1.7.4]
at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) ~[avro-1.7.4.jar:1.7.4]
at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) ~[avro-1.7.4.jar:1.7.4]
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152) ~[avro-1.7.4.jar:1.7.4]
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177) ~[avro-1.7.4.jar:1.7.4]
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148) ~[avro-1.7.4.jar:1.7.4]
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139) ~[avro-1.7.4.jar:1.7.4]
at com.appdynamics.blitz.shared.util.XXXXXXXXXXXXX.parseBinaryPayload(BlitzAvroSharedUtil.java:38) ~[blitz-shared.jar:na]
我从 this 文档中了解到,这应该是向后兼容的,但不知何故似乎并非如此。知道我做错了什么吗?
我可以在您的架构中看到两个可能的问题
- 我的默认值似乎总是以 null 工作
要指定此项,您需要设置
"default": null
- 同样在您的架构中,您确实忘记在数组和新字段之间添加一个 ,(字段分隔符)。因此,请尝试将您的模式更改为
{
"namespace": "xx.xxxx.xxxxx.xxxxx",
"type": "record",
"name": "MyPayLoad",
"fields": [
{"name": "filed1", "type": "string"},
{"name": "filed2", "type": "long"},
{"name": "filed3", "type": "boolean"},
{
"name" : "metrics",
"type":
{
"type" : "array",
"items":
{
"name": "MyRecord",
"type": "record",
"fields" :
[
{"name": "min", "type": "long"},
{"name": "max", "type": "long"},
{"name": "sum", "type": "long"},
{"name": "count", "type": "long"}
]
}
}
},
{"name": "agentType", "type": ["null", "string"], "default":null}
]
}
我终于成功了。我需要在 SpecificDatumReader 中给出这两个模式
所以我修改了这样的解析,我在 reader 中同时传递了旧模式和新模式,它的工作就像一个魅力
public static final MyPayLoad parseBinaryPayload(byte[] payload) {
DatumReader<MyPayLoad> payloadReader = new SpecificDatumReader<>(SCHEMA_V1, SCHEMA_V2);
Decoder decoder = DecoderFactory.get().binaryDecoder(payload, null);
MyPayLoad myPayLoad = null;
try {
myPayLoad = payloadReader.read(null, decoder);
} catch (IOException e) {
logger.log(Level.SEVERE, e.getMessage(), e);
}
return myPayLoad;
}
我正面临这种情况。尝试使用较新模式读取旧模式写入的数据时失败。较新的模式只有一个带有联合和默认设置的附加字段。
"type":["null","string"],"doc":"","default":null
尽管设置了默认值,但在读取过程中不会自动填充空值。阅读时需要提供作者和 reader 模式。我的理解是 avro 是向后兼容的,它应该能够支持更新的列而不需要旧的模式。
我有这个 avro 模式
{
"namespace": "xx.xxxx.xxxxx.xxxxx",
"type": "record",
"name": "MyPayLoad",
"fields": [
{"name": "filed1", "type": "string"},
{"name": "filed2", "type": "long"},
{"name": "filed3", "type": "boolean"},
{
"name" : "metrics",
"type":
{
"type" : "array",
"items":
{
"name": "MyRecord",
"type": "record",
"fields" :
[
{"name": "min", "type": "long"},
{"name": "max", "type": "long"},
{"name": "sum", "type": "long"},
{"name": "count", "type": "long"}
]
}
}
}
]
}
这是我们用来解析数据的代码
public static final MyPayLoad parseBinaryPayload(byte[] payload) {
DatumReader<MyPayLoad> payloadReader = new SpecificDatumReader<>(MyPayLoad.class);
Decoder decoder = DecoderFactory.get().binaryDecoder(payload, null);
MyPayLoad myPayLoad = null;
try {
myPayLoad = payloadReader.read(null, decoder);
} catch (IOException e) {
logger.log(Level.SEVERE, e.getMessage(), e);
}
return myPayLoad;
}
现在我想在架构中再添加一个字段,因此架构如下所示
{
"namespace": "xx.xxxx.xxxxx.xxxxx",
"type": "record",
"name": "MyPayLoad",
"fields": [
{"name": "filed1", "type": "string"},
{"name": "filed2", "type": "long"},
{"name": "filed3", "type": "boolean"},
{
"name" : "metrics",
"type":
{
"type" : "array",
"items":
{
"name": "MyRecord",
"type": "record",
"fields" :
[
{"name": "min", "type": "long"},
{"name": "max", "type": "long"},
{"name": "sum", "type": "long"},
{"name": "count", "type": "long"}
]
}
}
}
{"name": "agentType", "type": ["null", "string"], "default": "APP_AGENT"}
]
}
请注意添加的字段,并且还定义了默认值。问题是,如果我们收到使用旧模式写入的数据,我会收到此错误
java.io.EOFException: null
at org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473) ~[avro-1.7.4.jar:1.7.4]
at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128) ~[avro-1.7.4.jar:1.7.4]
at org.apache.avro.io.BinaryDecoder.readIndex(BinaryDecoder.java:423) ~[avro-1.7.4.jar:1.7.4]
at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) ~[avro-1.7.4.jar:1.7.4]
at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) ~[avro-1.7.4.jar:1.7.4]
at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) ~[avro-1.7.4.jar:1.7.4]
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152) ~[avro-1.7.4.jar:1.7.4]
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177) ~[avro-1.7.4.jar:1.7.4]
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148) ~[avro-1.7.4.jar:1.7.4]
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139) ~[avro-1.7.4.jar:1.7.4]
at com.appdynamics.blitz.shared.util.XXXXXXXXXXXXX.parseBinaryPayload(BlitzAvroSharedUtil.java:38) ~[blitz-shared.jar:na]
我从 this 文档中了解到,这应该是向后兼容的,但不知何故似乎并非如此。知道我做错了什么吗?
我可以在您的架构中看到两个可能的问题
- 我的默认值似乎总是以 null 工作 要指定此项,您需要设置
"default": null
- 同样在您的架构中,您确实忘记在数组和新字段之间添加一个 ,(字段分隔符)。因此,请尝试将您的模式更改为
{
"namespace": "xx.xxxx.xxxxx.xxxxx",
"type": "record",
"name": "MyPayLoad",
"fields": [
{"name": "filed1", "type": "string"},
{"name": "filed2", "type": "long"},
{"name": "filed3", "type": "boolean"},
{
"name" : "metrics",
"type":
{
"type" : "array",
"items":
{
"name": "MyRecord",
"type": "record",
"fields" :
[
{"name": "min", "type": "long"},
{"name": "max", "type": "long"},
{"name": "sum", "type": "long"},
{"name": "count", "type": "long"}
]
}
}
},
{"name": "agentType", "type": ["null", "string"], "default":null}
]
}
我终于成功了。我需要在 SpecificDatumReader 中给出这两个模式 所以我修改了这样的解析,我在 reader 中同时传递了旧模式和新模式,它的工作就像一个魅力
public static final MyPayLoad parseBinaryPayload(byte[] payload) {
DatumReader<MyPayLoad> payloadReader = new SpecificDatumReader<>(SCHEMA_V1, SCHEMA_V2);
Decoder decoder = DecoderFactory.get().binaryDecoder(payload, null);
MyPayLoad myPayLoad = null;
try {
myPayLoad = payloadReader.read(null, decoder);
} catch (IOException e) {
logger.log(Level.SEVERE, e.getMessage(), e);
}
return myPayLoad;
}
我正面临这种情况。尝试使用较新模式读取旧模式写入的数据时失败。较新的模式只有一个带有联合和默认设置的附加字段。 "type":["null","string"],"doc":"","default":null
尽管设置了默认值,但在读取过程中不会自动填充空值。阅读时需要提供作者和 reader 模式。我的理解是 avro 是向后兼容的,它应该能够支持更新的列而不需要旧的模式。