如何使用直接流在 Kafka Spark Streaming 中指定消费者组

how to specify consumer group in Kafka Spark Streaming using direct stream

如何使用直接流为 kafka spark streaming 指定消费者组 ID API。

HashMap<String, String> kafkaParams = new HashMap<String, String>();
kafkaParams.put("metadata.broker.list", brokers);
kafkaParams.put("auto.offset.reset", "largest");
kafkaParams.put("group.id", "app1");

    JavaPairInputDStream<String, String> messages = KafkaUtils.createDirectStream(
            jssc, 
            String.class, 
            String.class,
            StringDecoder.class, 
            StringDecoder.class, 
            kafkaParams, 
            topicsSet
    );

虽然我已经指定了配置但不确定是否遗漏了什么。使用 spark1.3

kafkaParams.put("group.id", "app1");

直接流 API 使用低级别的 Kafka API,因此无论如何都不使用消费者组。如果您想将消费者组与 Spark Streaming 一起使用,则必须使用基于 API.

的接收器

Full details are available in the doc !

createDirectStream in spark-streaming-kafka-0-8 不支持组模式,因为它使用的是低级别的Kafka API。

但是spark-streaming-kafka-0-10支持群组模式

Consumer Configs

In 0.9.0.0 we introduced the new Java consumer as a replacement for the older Scala-based simple and high-level consumers. The configs for both new and old consumers are described below.

New Consumer Configs 中,它有 group.id 项。

Spark Streaming integration for Kafka 0.10 正在使用新的 API。 https://spark.apache.org/docs/2.1.1/streaming-kafka-0-10-integration.html

The Spark Streaming integration for Kafka 0.10 is similar in design to the 0.8 Direct Stream approach. It provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata. However, because the newer integration uses the new Kafka consumer API instead of the simple API, there are notable differences in usage.

我已经在 spark-streaming-kafka-0-10 中测试了群组模式,它确实有效。