Apache Avro 使用@AvroSchema 从 Java POJO 生成错误的 Avro 模式
Apache Avro generates wrong Avro schema from Java POJO with @AvroSchema
我有一个带日期的简单 POJO,在导入 Google BigQuery 之前,它将作为 Avro 存储在存储器中。日期被转换为长日期,我正在尝试使用 @AvroSchema 覆盖日期字段的架构生成,以便 BigQuery 了解字段的类型。
简单的 POJO:
public class SomeAvroMessage implements Serializable {
@AvroSchema("{\"type\":\"long\",\"logicalType\":\"timestamp-millis\"}")
private long tm;
@AvroSchema("{\"type\":\"long\",\"logicalType\":\"timestamp-millis\"}")
private long created;
public SomeAvroMessage() {
}
}
这以以下 AVRO 模式结束:
{"type":"record","name":"SomeAvroMessage",
"namespace":"some.namespace",
"fields":[
{"name":"tm","type":{"type":"long","logicalType":"timestamp-millis"}},
{"name":"created","type":{"type":"long","logicalType":"timestamp-millis"}}
]}
这些似乎是错误的,应该只是 {"name":"tm","type":"long","logicalType":"timestamp-millis"}
这在 Google Dataflow 中使用,Apache Beam 2.22 用 Java 编写。
我是不是漏掉了什么?
值{"name":"tm","type":{"type":"long","logicalType":"timestamp-millis"}}
是正确的。如果我们把它展开成更清晰的伪代码,就是:
Field {
name: "tm",
type: Schema {
type: "long",
logicalType: "timestamp-millis"
}
}
可以看到该字段有一个name
和一个type
。 Avro 字段的 type
必须是 Avro 模式。 logicalType
字段位于模式内部,不与其相邻。
可以在documentation中找到:
A logical type is an Avro primitive or complex type with extra
attributes to represent a derived type. The attribute logicalType must
always be present for a logical type, and is a string with the name of
one of the logical types listed later in this section. Other
attributes may be defined for particular logical types.
文档还给出了 avro 模式中日期类型的示例:
{
"type": "int",
"logicalType": "date"
}
基本上您的模式是正确的,每次您需要使用某种逻辑类型时,您都可以像这样构建您的模式。
我有一个带日期的简单 POJO,在导入 Google BigQuery 之前,它将作为 Avro 存储在存储器中。日期被转换为长日期,我正在尝试使用 @AvroSchema 覆盖日期字段的架构生成,以便 BigQuery 了解字段的类型。
简单的 POJO:
public class SomeAvroMessage implements Serializable {
@AvroSchema("{\"type\":\"long\",\"logicalType\":\"timestamp-millis\"}")
private long tm;
@AvroSchema("{\"type\":\"long\",\"logicalType\":\"timestamp-millis\"}")
private long created;
public SomeAvroMessage() {
}
}
这以以下 AVRO 模式结束:
{"type":"record","name":"SomeAvroMessage",
"namespace":"some.namespace",
"fields":[
{"name":"tm","type":{"type":"long","logicalType":"timestamp-millis"}},
{"name":"created","type":{"type":"long","logicalType":"timestamp-millis"}}
]}
这些似乎是错误的,应该只是 {"name":"tm","type":"long","logicalType":"timestamp-millis"}
这在 Google Dataflow 中使用,Apache Beam 2.22 用 Java 编写。
我是不是漏掉了什么?
值{"name":"tm","type":{"type":"long","logicalType":"timestamp-millis"}}
是正确的。如果我们把它展开成更清晰的伪代码,就是:
Field {
name: "tm",
type: Schema {
type: "long",
logicalType: "timestamp-millis"
}
}
可以看到该字段有一个name
和一个type
。 Avro 字段的 type
必须是 Avro 模式。 logicalType
字段位于模式内部,不与其相邻。
可以在documentation中找到:
A logical type is an Avro primitive or complex type with extra attributes to represent a derived type. The attribute logicalType must always be present for a logical type, and is a string with the name of one of the logical types listed later in this section. Other attributes may be defined for particular logical types.
文档还给出了 avro 模式中日期类型的示例:
{
"type": "int",
"logicalType": "date"
}
基本上您的模式是正确的,每次您需要使用某种逻辑类型时,您都可以像这样构建您的模式。