Kafka 流到 Spark 流 python

Question

我们有使用 Avro 的 Kafka 流。我需要将它连接到 Spark Stream。我按照建议使用下面的代码。

kvs = KafkaUtils.createDirectStream(ssc, [topic], {"metadata.broker.list": brokers}, valueDecoder=MessageSerializer.decode_message)

当我通过 spark-submit 执行它时出现以下错误。

2018-10-09 10:49:27 WARN YarnSchedulerBackend$YarnSchedulerEndpoint:66 - Requesting driver to remove executor 12 for reason Container marked as failed: container_1537396420651_0008_01_000013 on host: server_name. Exit status: 1. Diagnostics: [2018-10-09 10:49:25.810]Exception from container-launch. Container id: container_1537396420651_0008_01_000013 Exit code: 1

[2018-10-09 10:49:25.810]

[2018-10-09 10:49:25.811]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err :

Last 4096 bytes of stderr :

Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000d5580000, 702545920, 0) failed; error='Cannot allocate memory' (errno=12)

[2018-10-09 10:49:25.822]

[2018-10-09 10:49:25.822]Container exited with a non-zero exit code 1. Error file: prelaunch.err.

Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr :

Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000d5580000, 702545920, 0) failed; error='Cannot allocate memory' (errno=12)

我使用了下面的命令。

spark-submit --master yarn --py-files ${BIG_DATA_LIBS}v3io-py.zip --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.2.0 --jars ${BIG_DATA_LIBS}v3io-hcfs_2.11.jar,${BIG_DATA_LIBS}v3io-spark2-object-dataframe_2.11.jar,${BIG_DATA_LIBS}v3io-spark2-streaming_2.11.jar ${APP_PATH}/${SCRIPT_PATH}/kafka_to_spark_stream.py

所有变量都正确导出。这是什么错误？

Answer 1

可能是您没有在 driver/executors 上分配足够的内存来处理流？

Kafka 流到 Spark 流 python

Kafka Stream to Spark Stream python

stream-processing

apache-spark

spark-streaming

pyspark

spark-submit