使用 AVRO 编码可选字符串

Encoding optional strings with AVRO

我正在使用 Avro 版本 1.10.2

我有以下架构,optionalField 作为可选的 JSON 字符串:

{
  "namespace": "foo.bar",
  "name": "FooBar",
  "type": "record",
  "fields": [
    {
      "name": "optionalField",
      "type": [
        "null",
        "string"
      ]
    }
  ]
}

我使用 Avro Maven 插件来执行代码生成。

但是,当我使用以下代码对该对象的实例进行编码时:

FooBar fooBar = FooBar.newBuilder()
                .setOptionalField("value")
                .build();

Schema schema = fooBar.getSchema();
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
Encoder jsonEncoder = EncoderFactory.get().jsonEncoder(schema, byteArrayOutputStream);
SpecificDatumWriter<Object> writer = new SpecificDatumWriter<>(schema);
writer.write(fooBar, jsonEncoder);
jsonEncoder.flush();

System.out.println(byteArrayOutputStream.toString());

输出为:

{"optionalField":{"string":"value"}}

而不是我所期望的:

{"optionalField":"value"}

据我所知,Avro specification 似乎没有暗示只有记录可以是可选的。此外,在工会下:

Unions, as mentioned above, are represented using JSON arrays. For example, ["null", "string"] declares a schema which may be either a null or string.

我的理解是否正确,Avro 真的允许可选的字符串字段吗?这是一个错误吗?我错过了什么?

Is my understanding correct and Avro really allows for optional string fields?

是的,Avro 支持 nullstring

的联合

What am I missing?

Avro JSON 编码器的工作方式与您预期的不同。如https://avro.apache.org/docs/current/spec.html#json_encoding, a union is encoded with the type information as a dictionary rather than just the value. There is an outstanding issue in the Avro ticket tracker that asks for the format you are looking for, but it has not been resolved: https://issues.apache.org/jira/browse/AVRO-1582

所述