在到达定界符之前接收 xml 数据的最有效方法是什么

What is the most efficient way to receieve xml data until a delimiter is reached

目前,我在使用基本套接字服务器时遇到问题。本质上,我无法控制此服务器的客户端,并且客户端正在发送 XML 未知长度的消息,这些消息由一组已知字符分隔。此问题的基本重现可以通过以下方式进行演示,

import socket
server_address = ('192.168.2.47', 10000)

#server

#client
def client():
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.connect(server_address)
    sock.send('<messageBody><ew32f/><dwadwa/></messageBody>')
    sock.send('<messageBody><dwaaw/><fewwfe/></messageBody>')
    sock.send('<messageBody><ewqf3x/><awdwad2/></messageBody>')
    # the socket will stay connected so long as the client continues sending data which could be days or more

def server():
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.bind(server_address)
    client, addr = sock.accept(1)
    # I need to find a way to receive the client data such that it stops receiving the the </messageBody> tag

我正在尝试找到解决此问题的最有效方法,因为服务器每秒可能从各种客户端接收数百条消息。这些消息的大小可能在几字节到几千字节之间。

我认为 Python 的 expat 解析器可以帮助您;它允许流式解析,并且块可以是 XML 的片段(如我下面示例中的 bar)。

我很确定我理解您的问题及其背景。这是我尝试向您的服务器显示接收此样本 XML:

<root>
    <foo />
    <bar />
    <messageBody>
        <ewqf3x />
        <awdwad2 />
    </messageBody>
    <baz />
</root>

但分块进行,就好像客户通过多次通话向您提供整个 XML body 一样。每个块都被解析,当 <messageBody/> end-tag 被读取时,会引发一个错误,这是您的信号,表明您拥有所需的一切并且可以停止处理(听?)。

#!/usr/bin/env python3
import sys
from xml.parsers.expat import ParserCreate

class FoundMessageBodyEnd(Exception):
    pass

def end_element(name):
    print(f'Processing end-tag for {name}')
    if name == 'messageBody':
        # This may not be the right way to do this
        raise FoundMessageBodyEnd


p = ParserCreate()
p.EndElementHandler = end_element

streaming_chunks = [
    '''<root>
    <foo />
    <bar ''',  # notice that bar is not closed till the first line of the next chunk
    '''/>
        <messageBody>
        <ewqf3x />
        <awdwad2 />''',
    '''    </messageBody>''',
    '''    <baz />
</root>''',
]

parsed = 0
for chunk in streaming_chunks:
    try:
        p.Parse(chunk)
        parsed += 1
    except FoundMessageBodyEnd:
        print(f'After parsing {parsed+1} chunks, found messageBody delimiter, done.')
        sys.exit(1)