是否可以使用 Python 从 EventHub 中接收器停止的位置继续读取?

Is it possible to continue reading from where receiver stopped in EventHub with Python?

我正在使用官方 azure-eventhub 库从 EventHub 的主题中读取事件。我希望当接收器停止读取时,EventHub 能够保留我的接收器组的最后一个偏移量,这样当我再次开始读取时,我可以从我停止的地方开始。这是在 Kafka 中使用提交策略完成的,但我在 azure-eventhub 库中找不到类似的东西。在 azure-eventhub 或 eventhub 的任何其他 python 库中是否有类似的东西?

您可以在 python 中使用 事件处理器主机 将检查点设置为 blob 存储。

详细的示例代码在 github 中的 here,我们将其用于检查点目的。

在使用事件处理器主机之前,您应该在示例代码中have/create一个azure storage account which is used to save the checkpoint. And create a container inside that azure storage account. The storage account / account key / container are used here

在示例中,这行代码context.checkpoint_async()用于设置检查点。

如果您对此有更多疑问,请告诉我。

您可以使用 azure-eventhub v5 来实现目标。 v5 sdk 集成了在 Azure Storage Blob 中以简单的方式存储检查点的功能。

azure-eventhub v5已于2020年1月正式发布,最新版本为v5.2.0

在 pypi 上可用:https://pypi.org/project/azure-eventhub/

请检查 sample code 以了解如何在 v5 中实现这一点:

#!/usr/bin/env python

# --------------------------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for license information.
# --------------------------------------------------------------------------------------------

"""
An example to show receiving events from an Event Hub with checkpoint store doing checkpoint by batch.
In the `receive_batch` method of `EventHubConsumerClient`:
If no partition id is specified, the checkpoint_store are used for load-balance and checkpoint.
If partition id is specified, the checkpoint_store can only be used for checkpoint without load balancing.
"""

import os
import logging
from azure.eventhub import EventHubConsumerClient
from azure.eventhub.extensions.checkpointstoreblob import BlobCheckpointStore

CONNECTION_STR = os.environ["EVENT_HUB_CONN_STR"]
EVENTHUB_NAME = os.environ['EVENT_HUB_NAME']
STORAGE_CONNECTION_STR = os.environ["AZURE_STORAGE_CONN_STR"]
BLOB_CONTAINER_NAME = "your-blob-container-name"  # Please make sure the blob container resource exists.

logging.basicConfig(level=logging.INFO)
log = logging.getLogger(__name__)


def on_event_batch(partition_context, event_batch):
    log.info("Partition {}, Received count: {}".format(partition_context.partition_id, len(event_batch)))
    # put your code here
    partition_context.update_checkpoint()


def receive_batch():
    checkpoint_store = BlobCheckpointStore.from_connection_string(STORAGE_CONNECTION_STR, BLOB_CONTAINER_NAME)
    client = EventHubConsumerClient.from_connection_string(
        CONNECTION_STR,
        consumer_group="$Default",
        eventhub_name=EVENTHUB_NAME,
        checkpoint_store=checkpoint_store,
    )
    with client:
        client.receive_batch(
            on_event_batch=on_event_batch,
            max_batch_size=100,
            starting_position="-1",  # "-1" is from the beginning of the partition.
        )


if __name__ == '__main__':
    receive_batch()

我们还提供migration guide from v1 to v5迁移程序。