将 Storm 1 Kafka 拓扑转换为 Heron,有几个问题
Converting a Storm 1 Kafka Topology to Heron, have a few questions
一直在尝试将 Storm 1.0.6 拓扑切换到 Heron。迈出一小步,移除除 Kafka spout 之外的所有内容,看看情况如何。有一个main方法如下(修改自原来的Flux版本):
import org.apache.heron.eco.Eco;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
public class KafkaTopology {
public static void main(String[] args) throws Exception {
List<String> argList = new ArrayList<String>(Arrays.asList(args));
String file = KafkaTopology.class.getClassLoader().getResource("topology.yaml").getFile();
argList.add("local");
argList.add("--eco-config-file");
argList.add(file);
file = KafkaTopology.class.getClassLoader().getResource("dev.properties").getFile();
argList.add("--props");
argList.add(file);
argList.add("--sleep");
argList.add("36000000");
String[] ecoArgs = argList.toArray(new String[argList.size()]);
Eco.main(ecoArgs);
}
}
YAML 是这样的:
name: "kafkaTopology-XXX_topologyVersion_XXX"
type: "storm"
config:
topology.workers: ${workers.config}
topology.max.spout.pending: ${max.spout.pending}
topology.message.timeout.secs: 120
topology.testing.always.try.serialize: true
storm.zookeeper.session.timeout: 30000
storm.zookeeper.connection.timeout: 30000
storm.zookeeper.retry.times: 5
storm.zookeeper.retry.interval: 2000
properties:
kafka.mapper.zkServers: ${kafka.mapper.zkServers}
kafka.mapper.zkPort: ${kafka.mapper.zkPort}
bootstrap.servers: ${bootstrap.servers}
kafka.mapper.brokerZkStr: ${kafka.mapper.brokerZkStr}
kafka.topic.name: ${kafka.topic.name}
components:
- id: "zkHosts"
className: "org.apache.storm.kafka.ZkHosts"
constructorArgs:
- ${kafka.mapper.brokerZkStr}
- id: "rawMessageAndMetadataScheme"
className: "org.acme.storm.spout.RawMessageAndMetadataScheme"
- id: "messageMetadataSchemeAsMultiScheme"
className: "org.apache.storm.kafka.MessageMetadataSchemeAsMultiScheme"
constructorArgs:
- ref: "rawMessageAndMetadataScheme"
- id: "kafkaSpoutConfig"
className: "org.apache.storm.kafka.SpoutConfig"
constructorArgs:
# brokerHosts
- ref: "zkHosts"
# topic
- ${kafka.topic.name}
# zkRoot
- "/zkRootKafka.kafkaSpout.builder"
# id
- ${kafka.topic.name}
properties:
- name: "scheme"
ref: "messageMetadataSchemeAsMultiScheme"
- name: zkServers
value: ${kafka.mapper.zkServers}
- name: zkPort
value: ${kafka.mapper.zkPort}
# Retry Properties
- name: "retryInitialDelayMs"
value: 60000
- name: "retryDelayMultiplier"
value: 1.5
- name: "retryDelayMaxMs"
value: 14400000
- name: "retryLimit"
value: 0
# spout definitions
spouts:
- id: "kafka-spout"
className: "org.apache.storm.kafka.KafkaSpout"
parallelism: ${kafka.spout.parallelism}
constructorArgs:
- ref: "kafkaSpoutConfig"
相关 POM 条目:
<dependency>
<groupId>org.apache.heron</groupId>
<artifactId>heron-api</artifactId>
<version>0.20.3-incubating</version>
</dependency>
<dependency>
<groupId>org.apache.heron</groupId>
<artifactId>heron-storm</artifactId>
<version>0.20.3-incubating</version>
</dependency>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-kafka</artifactId>
<version>1.0.6</version>
</dependency>
Main 方法似乎 运行 没问题:
Apr 30, 2021 4:38:49 PM org.apache.heron.eco.parser.EcoParser loadTopologyFromYaml
INFO: Parsing eco config file
Apr 30, 2021 4:38:49 PM org.apache.heron.eco.parser.EcoParser loadTopologyFromYaml
INFO: Performing property substitution.
Apr 30, 2021 4:38:49 PM org.apache.heron.eco.parser.EcoParser loadTopologyFromYaml
INFO: Performing environment variable substitution.
topology type is Storm
Apr 30, 2021 4:38:49 PM org.apache.heron.eco.builder.storm.EcoBuilder buildConfig
INFO: Building topology config
Apr 30, 2021 4:38:49 PM org.apache.heron.eco.Eco printTopologyInfo
INFO: ---------- TOPOLOGY DETAILS ----------
Apr 30, 2021 4:38:49 PM org.apache.heron.eco.Eco printTopologyInfo
INFO: Topology Name: kafkaTopology-XXX_topologyVersion_XXX
Apr 30, 2021 4:38:49 PM org.apache.heron.eco.Eco printTopologyInfo
INFO: --------------- SPOUTS ---------------
Apr 30, 2021 4:38:49 PM org.apache.heron.eco.Eco printTopologyInfo
INFO: kafka-spout [1] (org.apache.storm.kafka.KafkaSpout)
Apr 30, 2021 4:38:49 PM org.apache.heron.eco.Eco printTopologyInfo
INFO: ---------------- BOLTS ---------------
Apr 30, 2021 4:38:49 PM org.apache.heron.eco.Eco printTopologyInfo
INFO: --------------- STREAMS ---------------
Apr 30, 2021 4:38:49 PM org.apache.heron.eco.Eco printTopologyInfo
INFO: --------------------------------------
Apr 30, 2021 4:38:49 PM org.apache.heron.eco.builder.storm.EcoBuilder buildTopologyBuilder
INFO: Building components
Apr 30, 2021 4:38:49 PM org.apache.heron.eco.builder.storm.EcoBuilder buildTopologyBuilder
INFO: Building spouts
Apr 30, 2021 4:38:49 PM org.apache.heron.eco.builder.storm.EcoBuilder buildTopologyBuilder
INFO: Building bolts
Apr 30, 2021 4:38:49 PM org.apache.heron.eco.builder.storm.EcoBuilder buildTopologyBuilder
INFO: Building streams
Process finished with exit code 0
问题一:拓扑立即退出,是否有一个Eco flag等价于Flux '--sleep'让它保持运行ning一段时间(到调试等)?
问题 2: 有点惊讶我需要将 storm-kafka 拉进来(以为会有 Heron 等效项)- 这是正确的(还是其他一些工件? ) 如果是这样,1.0.6 是一个可以使用的版本还是 Heron 与另一个版本一起工作得更好?
问题 3: 以上是 YAML 中的 type: "storm"
,尝试 type: "heron"
会出现以下错误:
INFO: Building spouts
Exception in thread "main" java.lang.ClassCastException: class org.apache.storm.kafka.KafkaSpout cannot be cast to class org.apache.heron.api.spout.IRichSpout (org.apache.storm.kafka.KafkaSpout and org.apache.heron.api.spout.IRichSpout are in unnamed module of loader 'app')
at org.apache.heron.eco.builder.heron.SpoutBuilder.buildSpouts(SpoutBuilder.java:42)
at org.apache.heron.eco.builder.heron.EcoBuilder.buildTopologyBuilder(EcoBuilder.java:70)
at org.apache.heron.eco.Eco.submit(Eco.java:125)
at org.apache.heron.eco.Eco.main(Eco.java:161)
at KafkaTopology.main(KafkaTopology.java:26)
Process finished with exit code 1
这就是它使用 Kafka 的方式吗,类型需要是 storm 而不是 heron,还是这里有一些解决方法?
- Heron 有多个 Kafka Spout。我使用 Storm(storm-kafka-client-2.1) 克隆并在生产中使用它。
- https://search.maven.org/artifact/com.github.thinker0.heron/heron-kafka-client/1.0.4.1/jar
Question 1
:我不确定为什么拓扑会关闭你。尝试使用 --verbose
标志 运行 提交。此时 --sleep
参数的功能不存在。如果您需要,它可能是添加的功能。
Question 2
:有一个 Heron 等效项。在将 Heron 捐赠给 Apache 之后,需要做很多工作才能发布二进制版本。大部分工作已经完成。在下一个版本中,我希望所有二进制工件都能适当分发。
Question 3
:出现此问题是因为它根据指定的类型在某个包中查找 bolts/spouts。当输入“storm”时,它期望它实现或扩展的 类 是“org.apache.storm”。当输入“heron”时,它期望它实现或扩展的 类 是“org.apache.heron”。如果您使用依赖项 storm-kafka,则类型需要为“storm”。苍鹭等价物可以在这里找到。 https://search.maven.org/search?q=heron-kafka
一直在尝试将 Storm 1.0.6 拓扑切换到 Heron。迈出一小步,移除除 Kafka spout 之外的所有内容,看看情况如何。有一个main方法如下(修改自原来的Flux版本):
import org.apache.heron.eco.Eco;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
public class KafkaTopology {
public static void main(String[] args) throws Exception {
List<String> argList = new ArrayList<String>(Arrays.asList(args));
String file = KafkaTopology.class.getClassLoader().getResource("topology.yaml").getFile();
argList.add("local");
argList.add("--eco-config-file");
argList.add(file);
file = KafkaTopology.class.getClassLoader().getResource("dev.properties").getFile();
argList.add("--props");
argList.add(file);
argList.add("--sleep");
argList.add("36000000");
String[] ecoArgs = argList.toArray(new String[argList.size()]);
Eco.main(ecoArgs);
}
}
YAML 是这样的:
name: "kafkaTopology-XXX_topologyVersion_XXX"
type: "storm"
config:
topology.workers: ${workers.config}
topology.max.spout.pending: ${max.spout.pending}
topology.message.timeout.secs: 120
topology.testing.always.try.serialize: true
storm.zookeeper.session.timeout: 30000
storm.zookeeper.connection.timeout: 30000
storm.zookeeper.retry.times: 5
storm.zookeeper.retry.interval: 2000
properties:
kafka.mapper.zkServers: ${kafka.mapper.zkServers}
kafka.mapper.zkPort: ${kafka.mapper.zkPort}
bootstrap.servers: ${bootstrap.servers}
kafka.mapper.brokerZkStr: ${kafka.mapper.brokerZkStr}
kafka.topic.name: ${kafka.topic.name}
components:
- id: "zkHosts"
className: "org.apache.storm.kafka.ZkHosts"
constructorArgs:
- ${kafka.mapper.brokerZkStr}
- id: "rawMessageAndMetadataScheme"
className: "org.acme.storm.spout.RawMessageAndMetadataScheme"
- id: "messageMetadataSchemeAsMultiScheme"
className: "org.apache.storm.kafka.MessageMetadataSchemeAsMultiScheme"
constructorArgs:
- ref: "rawMessageAndMetadataScheme"
- id: "kafkaSpoutConfig"
className: "org.apache.storm.kafka.SpoutConfig"
constructorArgs:
# brokerHosts
- ref: "zkHosts"
# topic
- ${kafka.topic.name}
# zkRoot
- "/zkRootKafka.kafkaSpout.builder"
# id
- ${kafka.topic.name}
properties:
- name: "scheme"
ref: "messageMetadataSchemeAsMultiScheme"
- name: zkServers
value: ${kafka.mapper.zkServers}
- name: zkPort
value: ${kafka.mapper.zkPort}
# Retry Properties
- name: "retryInitialDelayMs"
value: 60000
- name: "retryDelayMultiplier"
value: 1.5
- name: "retryDelayMaxMs"
value: 14400000
- name: "retryLimit"
value: 0
# spout definitions
spouts:
- id: "kafka-spout"
className: "org.apache.storm.kafka.KafkaSpout"
parallelism: ${kafka.spout.parallelism}
constructorArgs:
- ref: "kafkaSpoutConfig"
相关 POM 条目:
<dependency>
<groupId>org.apache.heron</groupId>
<artifactId>heron-api</artifactId>
<version>0.20.3-incubating</version>
</dependency>
<dependency>
<groupId>org.apache.heron</groupId>
<artifactId>heron-storm</artifactId>
<version>0.20.3-incubating</version>
</dependency>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-kafka</artifactId>
<version>1.0.6</version>
</dependency>
Main 方法似乎 运行 没问题:
Apr 30, 2021 4:38:49 PM org.apache.heron.eco.parser.EcoParser loadTopologyFromYaml
INFO: Parsing eco config file
Apr 30, 2021 4:38:49 PM org.apache.heron.eco.parser.EcoParser loadTopologyFromYaml
INFO: Performing property substitution.
Apr 30, 2021 4:38:49 PM org.apache.heron.eco.parser.EcoParser loadTopologyFromYaml
INFO: Performing environment variable substitution.
topology type is Storm
Apr 30, 2021 4:38:49 PM org.apache.heron.eco.builder.storm.EcoBuilder buildConfig
INFO: Building topology config
Apr 30, 2021 4:38:49 PM org.apache.heron.eco.Eco printTopologyInfo
INFO: ---------- TOPOLOGY DETAILS ----------
Apr 30, 2021 4:38:49 PM org.apache.heron.eco.Eco printTopologyInfo
INFO: Topology Name: kafkaTopology-XXX_topologyVersion_XXX
Apr 30, 2021 4:38:49 PM org.apache.heron.eco.Eco printTopologyInfo
INFO: --------------- SPOUTS ---------------
Apr 30, 2021 4:38:49 PM org.apache.heron.eco.Eco printTopologyInfo
INFO: kafka-spout [1] (org.apache.storm.kafka.KafkaSpout)
Apr 30, 2021 4:38:49 PM org.apache.heron.eco.Eco printTopologyInfo
INFO: ---------------- BOLTS ---------------
Apr 30, 2021 4:38:49 PM org.apache.heron.eco.Eco printTopologyInfo
INFO: --------------- STREAMS ---------------
Apr 30, 2021 4:38:49 PM org.apache.heron.eco.Eco printTopologyInfo
INFO: --------------------------------------
Apr 30, 2021 4:38:49 PM org.apache.heron.eco.builder.storm.EcoBuilder buildTopologyBuilder
INFO: Building components
Apr 30, 2021 4:38:49 PM org.apache.heron.eco.builder.storm.EcoBuilder buildTopologyBuilder
INFO: Building spouts
Apr 30, 2021 4:38:49 PM org.apache.heron.eco.builder.storm.EcoBuilder buildTopologyBuilder
INFO: Building bolts
Apr 30, 2021 4:38:49 PM org.apache.heron.eco.builder.storm.EcoBuilder buildTopologyBuilder
INFO: Building streams
Process finished with exit code 0
问题一:拓扑立即退出,是否有一个Eco flag等价于Flux '--sleep'让它保持运行ning一段时间(到调试等)?
问题 2: 有点惊讶我需要将 storm-kafka 拉进来(以为会有 Heron 等效项)- 这是正确的(还是其他一些工件? ) 如果是这样,1.0.6 是一个可以使用的版本还是 Heron 与另一个版本一起工作得更好?
问题 3: 以上是 YAML 中的 type: "storm"
,尝试 type: "heron"
会出现以下错误:
INFO: Building spouts
Exception in thread "main" java.lang.ClassCastException: class org.apache.storm.kafka.KafkaSpout cannot be cast to class org.apache.heron.api.spout.IRichSpout (org.apache.storm.kafka.KafkaSpout and org.apache.heron.api.spout.IRichSpout are in unnamed module of loader 'app')
at org.apache.heron.eco.builder.heron.SpoutBuilder.buildSpouts(SpoutBuilder.java:42)
at org.apache.heron.eco.builder.heron.EcoBuilder.buildTopologyBuilder(EcoBuilder.java:70)
at org.apache.heron.eco.Eco.submit(Eco.java:125)
at org.apache.heron.eco.Eco.main(Eco.java:161)
at KafkaTopology.main(KafkaTopology.java:26)
Process finished with exit code 1
这就是它使用 Kafka 的方式吗,类型需要是 storm 而不是 heron,还是这里有一些解决方法?
- Heron 有多个 Kafka Spout。我使用 Storm(storm-kafka-client-2.1) 克隆并在生产中使用它。
- https://search.maven.org/artifact/com.github.thinker0.heron/heron-kafka-client/1.0.4.1/jar
Question 1
:我不确定为什么拓扑会关闭你。尝试使用 --verbose
标志 运行 提交。此时 --sleep
参数的功能不存在。如果您需要,它可能是添加的功能。
Question 2
:有一个 Heron 等效项。在将 Heron 捐赠给 Apache 之后,需要做很多工作才能发布二进制版本。大部分工作已经完成。在下一个版本中,我希望所有二进制工件都能适当分发。
Question 3
:出现此问题是因为它根据指定的类型在某个包中查找 bolts/spouts。当输入“storm”时,它期望它实现或扩展的 类 是“org.apache.storm”。当输入“heron”时,它期望它实现或扩展的 类 是“org.apache.heron”。如果您使用依赖项 storm-kafka,则类型需要为“storm”。苍鹭等价物可以在这里找到。 https://search.maven.org/search?q=heron-kafka