如何使用 Azure StorageStreamDownloader 中的 lxml iterparse?
how to use lxml iterparse from Azure StorageStreamDownloader?
我目前正在使用 lxml.etree.iterparse
逐个标记遍历 XML 文件标记。这在本地工作正常,但我想将 XML 文件移动到 Azure Blob 存储并在 Azure 函数中处理该文件。但是,我有点无法尝试从 StorageStreamDownloader
解析 XML 文件
本地编码
from lxml import etree
context = etree.iterparse('c:\Users\', tag='InstanceElement')
for event, elem in context:
# processing of the tag
从 Blob 流式传输
from lxml import etree
from azure.storage.filedatalake import DataLakeServiceClient
connect_str = ''
service = DataLakeServiceClient.from_connection_string(conn_str=connect_string)
System = service.get_file_system_client('')
FileClient = System.get_file_client('')
Stream = FileClient.download_file()
# Stuck on what the input must be for iterparse
context = etree.iterparse(, tag='InstanceElement')
for event, elem in context:
# processing of the tag
我不知道 iterparse
的输入必须是什么,所以关于如何在流式传输时解析 XML 文件有什么想法吗?
试试这个:
from lxml import etree
from azure.storage.filedatalake import DataLakeServiceClient
from io import BytesIO
connect_str = ''
service = DataLakeServiceClient.from_connection_string(conn_str=connect_str)
System = service.get_file_system_client('')
FileClient = System.get_file_client('test.xml')
content = FileClient.download_file().readall()
context = etree.iterparse(BytesIO(content), tag='InstanceElement')
for event, elem in context:
print(elem.text)
我的test.xml
内容:
结果:
我目前正在使用 lxml.etree.iterparse
逐个标记遍历 XML 文件标记。这在本地工作正常,但我想将 XML 文件移动到 Azure Blob 存储并在 Azure 函数中处理该文件。但是,我有点无法尝试从 StorageStreamDownloader
本地编码
from lxml import etree
context = etree.iterparse('c:\Users\', tag='InstanceElement')
for event, elem in context:
# processing of the tag
从 Blob 流式传输
from lxml import etree
from azure.storage.filedatalake import DataLakeServiceClient
connect_str = ''
service = DataLakeServiceClient.from_connection_string(conn_str=connect_string)
System = service.get_file_system_client('')
FileClient = System.get_file_client('')
Stream = FileClient.download_file()
# Stuck on what the input must be for iterparse
context = etree.iterparse(, tag='InstanceElement')
for event, elem in context:
# processing of the tag
我不知道 iterparse
的输入必须是什么,所以关于如何在流式传输时解析 XML 文件有什么想法吗?
试试这个:
from lxml import etree
from azure.storage.filedatalake import DataLakeServiceClient
from io import BytesIO
connect_str = ''
service = DataLakeServiceClient.from_connection_string(conn_str=connect_str)
System = service.get_file_system_client('')
FileClient = System.get_file_client('test.xml')
content = FileClient.download_file().readall()
context = etree.iterparse(BytesIO(content), tag='InstanceElement')
for event, elem in context:
print(elem.text)
我的test.xml
内容:
结果: