在到达定界符之前接收 xml 数据的最有效方法是什么
What is the most efficient way to receieve xml data until a delimiter is reached
目前,我在使用基本套接字服务器时遇到问题。本质上,我无法控制此服务器的客户端,并且客户端正在发送 XML 未知长度的消息,这些消息由一组已知字符分隔。此问题的基本重现可以通过以下方式进行演示,
import socket
server_address = ('192.168.2.47', 10000)
#server
#client
def client():
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(server_address)
sock.send('<messageBody><ew32f/><dwadwa/></messageBody>')
sock.send('<messageBody><dwaaw/><fewwfe/></messageBody>')
sock.send('<messageBody><ewqf3x/><awdwad2/></messageBody>')
# the socket will stay connected so long as the client continues sending data which could be days or more
def server():
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind(server_address)
client, addr = sock.accept(1)
# I need to find a way to receive the client data such that it stops receiving the the </messageBody> tag
我正在尝试找到解决此问题的最有效方法,因为服务器每秒可能从各种客户端接收数百条消息。这些消息的大小可能在几字节到几千字节之间。
我认为 Python 的 expat 解析器可以帮助您;它允许流式解析,并且块可以是 XML 的片段(如我下面示例中的 bar
)。
我很确定我理解您的问题及其背景。这是我尝试向您的服务器显示接收此样本 XML:
<root>
<foo />
<bar />
<messageBody>
<ewqf3x />
<awdwad2 />
</messageBody>
<baz />
</root>
但分块进行,就好像客户通过多次通话向您提供整个 XML body 一样。每个块都被解析,当 <messageBody/>
end-tag 被读取时,会引发一个错误,这是您的信号,表明您拥有所需的一切并且可以停止处理(听?)。
#!/usr/bin/env python3
import sys
from xml.parsers.expat import ParserCreate
class FoundMessageBodyEnd(Exception):
pass
def end_element(name):
print(f'Processing end-tag for {name}')
if name == 'messageBody':
# This may not be the right way to do this
raise FoundMessageBodyEnd
p = ParserCreate()
p.EndElementHandler = end_element
streaming_chunks = [
'''<root>
<foo />
<bar ''', # notice that bar is not closed till the first line of the next chunk
'''/>
<messageBody>
<ewqf3x />
<awdwad2 />''',
''' </messageBody>''',
''' <baz />
</root>''',
]
parsed = 0
for chunk in streaming_chunks:
try:
p.Parse(chunk)
parsed += 1
except FoundMessageBodyEnd:
print(f'After parsing {parsed+1} chunks, found messageBody delimiter, done.')
sys.exit(1)
目前,我在使用基本套接字服务器时遇到问题。本质上,我无法控制此服务器的客户端,并且客户端正在发送 XML 未知长度的消息,这些消息由一组已知字符分隔。此问题的基本重现可以通过以下方式进行演示,
import socket
server_address = ('192.168.2.47', 10000)
#server
#client
def client():
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(server_address)
sock.send('<messageBody><ew32f/><dwadwa/></messageBody>')
sock.send('<messageBody><dwaaw/><fewwfe/></messageBody>')
sock.send('<messageBody><ewqf3x/><awdwad2/></messageBody>')
# the socket will stay connected so long as the client continues sending data which could be days or more
def server():
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind(server_address)
client, addr = sock.accept(1)
# I need to find a way to receive the client data such that it stops receiving the the </messageBody> tag
我正在尝试找到解决此问题的最有效方法,因为服务器每秒可能从各种客户端接收数百条消息。这些消息的大小可能在几字节到几千字节之间。
我认为 Python 的 expat 解析器可以帮助您;它允许流式解析,并且块可以是 XML 的片段(如我下面示例中的 bar
)。
我很确定我理解您的问题及其背景。这是我尝试向您的服务器显示接收此样本 XML:
<root>
<foo />
<bar />
<messageBody>
<ewqf3x />
<awdwad2 />
</messageBody>
<baz />
</root>
但分块进行,就好像客户通过多次通话向您提供整个 XML body 一样。每个块都被解析,当 <messageBody/>
end-tag 被读取时,会引发一个错误,这是您的信号,表明您拥有所需的一切并且可以停止处理(听?)。
#!/usr/bin/env python3
import sys
from xml.parsers.expat import ParserCreate
class FoundMessageBodyEnd(Exception):
pass
def end_element(name):
print(f'Processing end-tag for {name}')
if name == 'messageBody':
# This may not be the right way to do this
raise FoundMessageBodyEnd
p = ParserCreate()
p.EndElementHandler = end_element
streaming_chunks = [
'''<root>
<foo />
<bar ''', # notice that bar is not closed till the first line of the next chunk
'''/>
<messageBody>
<ewqf3x />
<awdwad2 />''',
''' </messageBody>''',
''' <baz />
</root>''',
]
parsed = 0
for chunk in streaming_chunks:
try:
p.Parse(chunk)
parsed += 1
except FoundMessageBodyEnd:
print(f'After parsing {parsed+1} chunks, found messageBody delimiter, done.')
sys.exit(1)