从 Azure Eventhub 读取 Spark => StreamingQueryException:输入字节数组有错误的 4 字节结束单元

Spark reading from Azure Eventhub => StreamingQueryException: Input byte array has wrong 4-byte ending unit

我正在尝试使用 Spark/Python 收集 Azure Eventhub 消息。 每次,我都会收到异常“StreamingQueryException:输入字节数组的 4 字节结束单元错误”

有什么想法吗?

conf = {}
conf["eventhubs.connectionString"] = "Endpoint=sb://XXXXXXXXX.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=XXXXXXXXXXXXX=;EntityPath=XXXXXX"
                                      
read_df  = spark.readStream.format("eventhubs").options(**conf).load()
stream = read_df.writeStream.format("console").start()
stream.awaitTermination()

注意2.3.15及以上版本需要在配置字典中加密连接字符串:

ehConf['eventhubs.connectionString'] = sc._jvm.org.apache.spark.eventhubs.EventHubsUtils.encrypt(connectionString)

https://github.com/Azure/azure-event-hubs-spark/blob/master/docs/PySpark/structured-streaming-pyspark.md#event-hubs-configuration