xml 字符串中的无效标记,无法创建元素树 python
invalid token in xml string, fail to create element tree python
我被困在一个对其他人来说可能很容易解决的问题上。我正在尝试从通过套接字而不是从文件接收的 xml 字符串创建元素树。
方法:
下面的这个 python 脚本是一个套接字客户端,它接收一个 python 字符串(恰好是 xml),该字符串是由 c++ 服务器使用 tinyxml 创建的.
程序步骤:
1)创建套接字
2) 接收 xml 字符串
3) 将 xml 解析为可以在其他地方使用的元素树
问题:
fromstring() 函数似乎无法弄清楚。这是我的代码:
import socket
import sys
import struct
import binascii
import io
import re
from xml.etree import ElementTree
#illegal characters to remove from string later before going to xml
RE_XML_ILLEGAL = u'([\u0000-\u0008\u000b-\u000c\u000e-\u001f\ufffe-\uffff])' + \
u'|' + \
u'([%s-%s][^%s-%s])|([^%s-%s][%s-%s])|([%s-%s]$)|(^[%s-%s])' % \
(unichr(0xd800),unichr(0xdbff),unichr(0xdc00),unichr(0xdfff),
unichr(0xd800),unichr(0xdbff),unichr(0xdc00),unichr(0xdfff),
unichr(0xd800),unichr(0xdbff),unichr(0xdc00),unichr(0xdfff))
HOST = 'localhost'
PORT = 50008
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
print 'Socket created'
print 'Socket now connecting'
s.connect((HOST,PORT))
s.send('1')#as long as we are not sending "0" cpp server will return information.
#declare global xml object "root"
global root
while 1:
data = s.recv(1024)#receive the initial message
data3 = data[:3]#get first 3 letters
if (data3 == "New"):
#get ready for new packet
nextsizestring = data[3:]
nextsizestring2 = nextsizestring.rstrip('[=12=]')
nextsize = int(nextsizestring2,10)
s.send('b')#tell cpp we are ready for the packet
databuf = s.recv(nextsize)#data buffer as a python string
databuf2 = re.sub(RE_XML_ILLEGAL, "?", databuf)#remove illegal xml characters
print(databuf2)
root = ElementTree.ElementTree(ElementTree.fromstring(databuf2))#convert to element tree
print(root)
elif (data3 != "New"):
print("WARNING! TCP SYNCH HAS FAILED")
if not data: break#if not data then stop listening for more
s.send('b')#keep sending anything but zero to get more stuff
conn.close()
s.close()
这是输出:
Socket created
Socket now connecting
<Frame>
<FrameNumber ="1509677" />
<Time ="27427839" />
<Forceplatedata>
<Forceplate_0>
<Subframe#_0>
<F_x ="0" />
<F_y ="0" />
<F_z ="0" />
</Subframe#_0>
.
.
.
</Frame>
Traceback (most recent call last):
File "<string>", line 11, in <module>
File "C:\Users\Gelsey Torres- Oviedo\Desktop\VizardFolderVRServer\Python2CPP_Client_rev1.py", line 50, in <module>
root = ElementTree.ElementTree(ElementTree.fromstring(databuf2))
File "C:\Program Files (x86)\WorldViz\Vizard4\bin\lib\xml\etree\ElementTree.py", line 1282, in XML
parser.feed(text)
File "C:\Program Files (x86)\WorldViz\Vizard4\bin\lib\xml\etree\ElementTree.py", line 1624, in feed
self._raiseerror(v)
File "C:\Program Files (x86)\WorldViz\Vizard4\bin\lib\xml\etree\ElementTree.py", line 1488, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 2, column 18
我冒昧地截断了上面的 xml 字符串,因为它很长。正如您在错误中看到的那样,第 2 行第 18 行似乎有问题,我认为是 space " " 字符。我不明白为什么会这样。
失败的解决方案:
1) 将字符串作为 stringIO 传递给 parse()
2)编码和解码utf-8的几种变体
3) 与 minidom
类似的方法
我猜这是一个语法问题?我可能正在做一些非常愚蠢的事情...
Senshin 说的是关键问题。我正在创建格式错误的 xml。
通过改变所有看起来像的地方
<FrameNumber ="1381949" />
至
<FrameNumber attribute="1381949" />
程序现在可以创建元素树了。
我就知道这么简单,谢谢!
我被困在一个对其他人来说可能很容易解决的问题上。我正在尝试从通过套接字而不是从文件接收的 xml 字符串创建元素树。
方法:
下面的这个 python 脚本是一个套接字客户端,它接收一个 python 字符串(恰好是 xml),该字符串是由 c++ 服务器使用 tinyxml 创建的.
程序步骤: 1)创建套接字 2) 接收 xml 字符串 3) 将 xml 解析为可以在其他地方使用的元素树
问题:
fromstring() 函数似乎无法弄清楚。这是我的代码:
import socket
import sys
import struct
import binascii
import io
import re
from xml.etree import ElementTree
#illegal characters to remove from string later before going to xml
RE_XML_ILLEGAL = u'([\u0000-\u0008\u000b-\u000c\u000e-\u001f\ufffe-\uffff])' + \
u'|' + \
u'([%s-%s][^%s-%s])|([^%s-%s][%s-%s])|([%s-%s]$)|(^[%s-%s])' % \
(unichr(0xd800),unichr(0xdbff),unichr(0xdc00),unichr(0xdfff),
unichr(0xd800),unichr(0xdbff),unichr(0xdc00),unichr(0xdfff),
unichr(0xd800),unichr(0xdbff),unichr(0xdc00),unichr(0xdfff))
HOST = 'localhost'
PORT = 50008
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
print 'Socket created'
print 'Socket now connecting'
s.connect((HOST,PORT))
s.send('1')#as long as we are not sending "0" cpp server will return information.
#declare global xml object "root"
global root
while 1:
data = s.recv(1024)#receive the initial message
data3 = data[:3]#get first 3 letters
if (data3 == "New"):
#get ready for new packet
nextsizestring = data[3:]
nextsizestring2 = nextsizestring.rstrip('[=12=]')
nextsize = int(nextsizestring2,10)
s.send('b')#tell cpp we are ready for the packet
databuf = s.recv(nextsize)#data buffer as a python string
databuf2 = re.sub(RE_XML_ILLEGAL, "?", databuf)#remove illegal xml characters
print(databuf2)
root = ElementTree.ElementTree(ElementTree.fromstring(databuf2))#convert to element tree
print(root)
elif (data3 != "New"):
print("WARNING! TCP SYNCH HAS FAILED")
if not data: break#if not data then stop listening for more
s.send('b')#keep sending anything but zero to get more stuff
conn.close()
s.close()
这是输出:
Socket created
Socket now connecting
<Frame>
<FrameNumber ="1509677" />
<Time ="27427839" />
<Forceplatedata>
<Forceplate_0>
<Subframe#_0>
<F_x ="0" />
<F_y ="0" />
<F_z ="0" />
</Subframe#_0>
.
.
.
</Frame>
Traceback (most recent call last):
File "<string>", line 11, in <module>
File "C:\Users\Gelsey Torres- Oviedo\Desktop\VizardFolderVRServer\Python2CPP_Client_rev1.py", line 50, in <module>
root = ElementTree.ElementTree(ElementTree.fromstring(databuf2))
File "C:\Program Files (x86)\WorldViz\Vizard4\bin\lib\xml\etree\ElementTree.py", line 1282, in XML
parser.feed(text)
File "C:\Program Files (x86)\WorldViz\Vizard4\bin\lib\xml\etree\ElementTree.py", line 1624, in feed
self._raiseerror(v)
File "C:\Program Files (x86)\WorldViz\Vizard4\bin\lib\xml\etree\ElementTree.py", line 1488, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 2, column 18
我冒昧地截断了上面的 xml 字符串,因为它很长。正如您在错误中看到的那样,第 2 行第 18 行似乎有问题,我认为是 space " " 字符。我不明白为什么会这样。
失败的解决方案:
1) 将字符串作为 stringIO 传递给 parse() 2)编码和解码utf-8的几种变体 3) 与 minidom
类似的方法我猜这是一个语法问题?我可能正在做一些非常愚蠢的事情...
Senshin 说的是关键问题。我正在创建格式错误的 xml。
通过改变所有看起来像的地方
<FrameNumber ="1381949" />
至
<FrameNumber attribute="1381949" />
程序现在可以创建元素树了。
我就知道这么简单,谢谢!