是否可以使用 Python 从 EventHub 中接收器停止的位置继续读取?
Is it possible to continue reading from where receiver stopped in EventHub with Python?
我正在使用官方 azure-eventhub 库从 EventHub 的主题中读取事件。我希望当接收器停止读取时,EventHub 能够保留我的接收器组的最后一个偏移量,这样当我再次开始读取时,我可以从我停止的地方开始。这是在 Kafka 中使用提交策略完成的,但我在 azure-eventhub 库中找不到类似的东西。在 azure-eventhub 或 eventhub 的任何其他 python 库中是否有类似的东西?
您可以在 python 中使用 事件处理器主机 将检查点设置为 blob 存储。
详细的示例代码在 github 中的 here,我们将其用于检查点目的。
在使用事件处理器主机之前,您应该在示例代码中have/create一个azure storage account which is used to save the checkpoint. And create a container inside that azure storage account. The storage account
/ account key
/ container
are used here。
在示例中,这行代码context.checkpoint_async()
用于设置检查点。
如果您对此有更多疑问,请告诉我。
您可以使用 azure-eventhub v5 来实现目标。 v5 sdk 集成了在 Azure Storage Blob 中以简单的方式存储检查点的功能。
azure-eventhub v5已于2020年1月正式发布,最新版本为v5.2.0
在 pypi 上可用:https://pypi.org/project/azure-eventhub/
请检查 sample code 以了解如何在 v5 中实现这一点:
#!/usr/bin/env python
# --------------------------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for license information.
# --------------------------------------------------------------------------------------------
"""
An example to show receiving events from an Event Hub with checkpoint store doing checkpoint by batch.
In the `receive_batch` method of `EventHubConsumerClient`:
If no partition id is specified, the checkpoint_store are used for load-balance and checkpoint.
If partition id is specified, the checkpoint_store can only be used for checkpoint without load balancing.
"""
import os
import logging
from azure.eventhub import EventHubConsumerClient
from azure.eventhub.extensions.checkpointstoreblob import BlobCheckpointStore
CONNECTION_STR = os.environ["EVENT_HUB_CONN_STR"]
EVENTHUB_NAME = os.environ['EVENT_HUB_NAME']
STORAGE_CONNECTION_STR = os.environ["AZURE_STORAGE_CONN_STR"]
BLOB_CONTAINER_NAME = "your-blob-container-name" # Please make sure the blob container resource exists.
logging.basicConfig(level=logging.INFO)
log = logging.getLogger(__name__)
def on_event_batch(partition_context, event_batch):
log.info("Partition {}, Received count: {}".format(partition_context.partition_id, len(event_batch)))
# put your code here
partition_context.update_checkpoint()
def receive_batch():
checkpoint_store = BlobCheckpointStore.from_connection_string(STORAGE_CONNECTION_STR, BLOB_CONTAINER_NAME)
client = EventHubConsumerClient.from_connection_string(
CONNECTION_STR,
consumer_group="$Default",
eventhub_name=EVENTHUB_NAME,
checkpoint_store=checkpoint_store,
)
with client:
client.receive_batch(
on_event_batch=on_event_batch,
max_batch_size=100,
starting_position="-1", # "-1" is from the beginning of the partition.
)
if __name__ == '__main__':
receive_batch()
我们还提供migration guide from v1 to v5迁移程序。
我正在使用官方 azure-eventhub 库从 EventHub 的主题中读取事件。我希望当接收器停止读取时,EventHub 能够保留我的接收器组的最后一个偏移量,这样当我再次开始读取时,我可以从我停止的地方开始。这是在 Kafka 中使用提交策略完成的,但我在 azure-eventhub 库中找不到类似的东西。在 azure-eventhub 或 eventhub 的任何其他 python 库中是否有类似的东西?
您可以在 python 中使用 事件处理器主机 将检查点设置为 blob 存储。
详细的示例代码在 github 中的 here,我们将其用于检查点目的。
在使用事件处理器主机之前,您应该在示例代码中have/create一个azure storage account which is used to save the checkpoint. And create a container inside that azure storage account. The storage account
/ account key
/ container
are used here。
在示例中,这行代码context.checkpoint_async()
用于设置检查点。
如果您对此有更多疑问,请告诉我。
您可以使用 azure-eventhub v5 来实现目标。 v5 sdk 集成了在 Azure Storage Blob 中以简单的方式存储检查点的功能。
azure-eventhub v5已于2020年1月正式发布,最新版本为v5.2.0
在 pypi 上可用:https://pypi.org/project/azure-eventhub/
请检查 sample code 以了解如何在 v5 中实现这一点:
#!/usr/bin/env python
# --------------------------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for license information.
# --------------------------------------------------------------------------------------------
"""
An example to show receiving events from an Event Hub with checkpoint store doing checkpoint by batch.
In the `receive_batch` method of `EventHubConsumerClient`:
If no partition id is specified, the checkpoint_store are used for load-balance and checkpoint.
If partition id is specified, the checkpoint_store can only be used for checkpoint without load balancing.
"""
import os
import logging
from azure.eventhub import EventHubConsumerClient
from azure.eventhub.extensions.checkpointstoreblob import BlobCheckpointStore
CONNECTION_STR = os.environ["EVENT_HUB_CONN_STR"]
EVENTHUB_NAME = os.environ['EVENT_HUB_NAME']
STORAGE_CONNECTION_STR = os.environ["AZURE_STORAGE_CONN_STR"]
BLOB_CONTAINER_NAME = "your-blob-container-name" # Please make sure the blob container resource exists.
logging.basicConfig(level=logging.INFO)
log = logging.getLogger(__name__)
def on_event_batch(partition_context, event_batch):
log.info("Partition {}, Received count: {}".format(partition_context.partition_id, len(event_batch)))
# put your code here
partition_context.update_checkpoint()
def receive_batch():
checkpoint_store = BlobCheckpointStore.from_connection_string(STORAGE_CONNECTION_STR, BLOB_CONTAINER_NAME)
client = EventHubConsumerClient.from_connection_string(
CONNECTION_STR,
consumer_group="$Default",
eventhub_name=EVENTHUB_NAME,
checkpoint_store=checkpoint_store,
)
with client:
client.receive_batch(
on_event_batch=on_event_batch,
max_batch_size=100,
starting_position="-1", # "-1" is from the beginning of the partition.
)
if __name__ == '__main__':
receive_batch()
我们还提供migration guide from v1 to v5迁移程序。