使用 Hortonworks Schema Registry 读取 Java 应用程序中的 Avro 文件
Reading Avro files in Java application with Hortonworks Schema Registry
我有一个应用程序正在以 Avro 格式写入文件(每个文件有多个记录),但我无法在另一个 Java 应用程序中读取它。这是我试过的
Map<String, Object> registryConfig = new HashMap<>();
registryConfig.put("schema.registry.client.class.loader.cache.size", 10L);
registryConfig.put("schema.registry.url", "http://localhost:9090/api/v1");
registryConfig.put("schema.registry.client.class.loader.cache.expiry.interval.secs", 10L);
registryConfig.put("schema.registry.deserializer.schema.cache.size", 10L);
registryConfig.put("schema.registry.client.schema.metadata.cache.size", 10L);
registryConfig.put("schema.registry.client.schema.text.cache.expiry.interval.secs", 10000L);
registryConfig.put("schema.registry.client.schema.version.cache.expiry.interval.secs", 10000L);
registryConfig.put("schema.registry.client.schema.metadata.cache.expiry.interval.secs", 10L);
registryConfig.put("specific.avro.reader", false);
registryConfig.put("schema.registry.client.schema.version.cache.size", 10L);
registryConfig.put("schema.registry.client.schema.version.text.size", 10L);
registryConfig.put("schemaregistry.deserializer.schema.cache.expiry.secs", 10000L);
SchemaRegistryClient registryClient = new SchemaRegistryClient(registryConfig);
AvroSnapshotDeserializer deserializer = new AvroSnapshotDeserializer(registryClient);
deserializer.init(registryConfig);
Path p = Paths.get("/tmp/dump.avro");
InputStream is = Files.newInputStream(p);
deserializer.deserialize(is);
但是它抛出
Exception in thread "main" com.hortonworks.registries.schemaregistry.serdes.avro.exceptions.AvroException: Unknown protocol id [79] received while deserializing the payload
at com.hortonworks.registries.schemaregistry.serdes.avro.AvroSnapshotDeserializer.checkProtocolHandlerExists(AvroSnapshotDeserializer.java:70)
at com.hortonworks.registries.schemaregistry.serdes.avro.AvroSnapshotDeserializer.retrieveProtocolId(AvroSnapshotDeserializer.java:63)
at com.hortonworks.registries.schemaregistry.serdes.avro.AvroSnapshotDeserializer.retrieveProtocolId(AvroSnapshotDeserializer.java:32)
at com.hortonworks.registries.schemaregistry.serde.AbstractSnapshotDeserializer.deserialize(AbstractSnapshotDeserializer.java:141)
at com.hortonworks.registries.schemaregistry.serde.AbstractSnapshotDeserializer.deserialize(AbstractSnapshotDeserializer.java:55)
at com.hortonworks.registries.schemaregistry.serde.SnapshotDeserializer.deserialize(SnapshotDeserializer.java:60)
我知道您很难重现此问题,因为它需要我的架构注册表和文件。不过,我希望我在这里做了一些愚蠢的事情。任何帮助,将不胜感激。
好的...我从错误消息中了解到 79
是字母 O
的 ASCII
代码。然后我仔细检查了我的文件是否真的使用了模式注册表——事实证明它们没有。它们只是具有嵌入式架构的 Avro 文件。因此,我不需要 Hortonworks 的 AvroSnapshotDeserializer
- 简单的 DataFileReader
就可以了。
我有一个应用程序正在以 Avro 格式写入文件(每个文件有多个记录),但我无法在另一个 Java 应用程序中读取它。这是我试过的
Map<String, Object> registryConfig = new HashMap<>();
registryConfig.put("schema.registry.client.class.loader.cache.size", 10L);
registryConfig.put("schema.registry.url", "http://localhost:9090/api/v1");
registryConfig.put("schema.registry.client.class.loader.cache.expiry.interval.secs", 10L);
registryConfig.put("schema.registry.deserializer.schema.cache.size", 10L);
registryConfig.put("schema.registry.client.schema.metadata.cache.size", 10L);
registryConfig.put("schema.registry.client.schema.text.cache.expiry.interval.secs", 10000L);
registryConfig.put("schema.registry.client.schema.version.cache.expiry.interval.secs", 10000L);
registryConfig.put("schema.registry.client.schema.metadata.cache.expiry.interval.secs", 10L);
registryConfig.put("specific.avro.reader", false);
registryConfig.put("schema.registry.client.schema.version.cache.size", 10L);
registryConfig.put("schema.registry.client.schema.version.text.size", 10L);
registryConfig.put("schemaregistry.deserializer.schema.cache.expiry.secs", 10000L);
SchemaRegistryClient registryClient = new SchemaRegistryClient(registryConfig);
AvroSnapshotDeserializer deserializer = new AvroSnapshotDeserializer(registryClient);
deserializer.init(registryConfig);
Path p = Paths.get("/tmp/dump.avro");
InputStream is = Files.newInputStream(p);
deserializer.deserialize(is);
但是它抛出
Exception in thread "main" com.hortonworks.registries.schemaregistry.serdes.avro.exceptions.AvroException: Unknown protocol id [79] received while deserializing the payload
at com.hortonworks.registries.schemaregistry.serdes.avro.AvroSnapshotDeserializer.checkProtocolHandlerExists(AvroSnapshotDeserializer.java:70)
at com.hortonworks.registries.schemaregistry.serdes.avro.AvroSnapshotDeserializer.retrieveProtocolId(AvroSnapshotDeserializer.java:63)
at com.hortonworks.registries.schemaregistry.serdes.avro.AvroSnapshotDeserializer.retrieveProtocolId(AvroSnapshotDeserializer.java:32)
at com.hortonworks.registries.schemaregistry.serde.AbstractSnapshotDeserializer.deserialize(AbstractSnapshotDeserializer.java:141)
at com.hortonworks.registries.schemaregistry.serde.AbstractSnapshotDeserializer.deserialize(AbstractSnapshotDeserializer.java:55)
at com.hortonworks.registries.schemaregistry.serde.SnapshotDeserializer.deserialize(SnapshotDeserializer.java:60)
我知道您很难重现此问题,因为它需要我的架构注册表和文件。不过,我希望我在这里做了一些愚蠢的事情。任何帮助,将不胜感激。
好的...我从错误消息中了解到 79
是字母 O
的 ASCII
代码。然后我仔细检查了我的文件是否真的使用了模式注册表——事实证明它们没有。它们只是具有嵌入式架构的 Avro 文件。因此,我不需要 Hortonworks 的 AvroSnapshotDeserializer
- 简单的 DataFileReader
就可以了。