我如何更改 Debezium 默认主题命名约定以使其适合 confluent hive table 自动生成的策略?

How can i change the Debezium default topic naming convention to make it fit for confluent hive table auto-generated strategy?

我正在构建一个数据同步器,它从 MySQL 源捕获数据变化,并将数据导出到配置单元。

我选择使用Kafka Connect来实现这个。我使用 Debezium as source connector, and confluent hdfs 作为接收器连接器。

但问题是,Debezium 对 Kafka 主题的命名约定如下:

serverName.databaseName.tableName

在融合的 hdfs 接收器属性中,我必须配置 topics 与生成的 Debezium 相同:

"topics": "serverName.databaseName.tableName"

Confluent hdfs 接收器连接器将在 HDFS 中生成如下路径:

/topics/serverName.databaseName.tableName/partition=0

这肯定会在 HDFS/Hive 中引起一些问题,因为路径包含语法 .,实际上,外部的 table 由 confluent hdfs sink connector 自动生成失败,由于路径问题。

2020-05-08T00:42:02,717 ERROR [pool-6-thread-31] metastore.RetryingHMSHandler: MetaException(message:java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs://localhost:9000./null/topics/dbserver1.test_data_1.student1)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:6935)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:2050)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
    at com.sun.proxy.$Proxy26.create_table_with_environment_context(Unknown Source)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:14800)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:14784)
    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
    at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.run(TUGIBasedProcessor.java:111)
    at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.run(TUGIBasedProcessor.java:107)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
    at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119)
    at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs://localhost:9000./null/topics/dbserver1.test_data_1.student1
    at org.apache.hadoop.fs.Path.initialize(Path.java:263)
    at org.apache.hadoop.fs.Path.<init>(Path.java:254)
    at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:143)
    at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:147)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1852)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1786)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:2035)
    ... 20 more
Caused by: java.net.URISyntaxException: Relative path in absolute URI: hdfs://localhost:9000./null/topics/dbserver1.test_data_1.student1
    at java.net.URI.checkPath(URI.java:1823)
    at java.net.URI.<init>(URI.java:745)
    at org.apache.hadoop.fs.Path.initialize(Path.java:260)
    ... 26 more

那么我是否可以更改主题的 Debezium 默认命名约定,或者,我可以更改通过主题名称生成的 confluent hdfs 接收器连接器的默认路径吗?

HDFS 连接器将 replace dots (and dashes) with underscores when creating Hive tables

HDFS 本​​身并不关心路径中的点。问题是你不能在端口后面有一个点,而且你在那里有 /null 不知何故。

hdfs://localhost:9000./null


is there anyway that i can change the Debezium default naming convention for topics

解决方案与Debezium无关。您可以在 transforms 配置中使用 RegexRouter 基础 Apache Kafka Connect 库作为源连接器或接收器连接器,具体取决于您希望多早 "fix" 解决问题。

您也可以编写自己的转换并将其放入 Connect 的 plugin.path