从文件而不是字符串解析 xml 时出错? Python
Error parsing xml from file but not as string? Python
我正在尝试使用 xml2dict
来解析大量 xml 文件,以便我可以将它们转换为数据帧,但是,当我尝试解析实际的 xml文件我收到错误:
"ExpatError: not well-formed (invalid token): line 1, column 5"
对于所有 xml 文件,包括 "line 1, column 5",此错误完全相同,它们的长度差异很大,但结构完全相同。
当我尝试将 xml 文件的内容复制为 python 中的字符串时,xml2dict 的解析工作完美。例如:
xmlstr ="""<?xml version="1.0" encoding="utf-8"?>
<document id="DDI-DrugBank.d200">
<sentence id="DDI-DrugBank.d200.s0" text="Co-administration of probenecid with acyclovir has been shown to increase the mean half-life and the area under the concentration-time curve.">
<entity id="DDI-DrugBank.d200.s0.e0" charOffset="21-30"
type="drug" text="probenecid"/>
<entity id="DDI-DrugBank.d200.s0.e1" charOffset="37-45"
type="drug" text="acyclovir"/>
<pair id="DDI-DrugBank.d200.s0.p0" e1="DDI-DrugBank.d200.s0.e0"
e2="DDI-DrugBank.d200.s0.e1" ddi="true" type="mechanism"/>
</sentence>
<sentence id="DDI-DrugBank.d200.s1" text="Urinary excretion and renal clearance were correspondingly reduced."/>
<sentence id="DDI-DrugBank.d200.s2" text="The clinical effects of this combination have not been studied."/>
</document>"""
import xmltodict as x2d
nestdict1 = x2d.parse('Train/DrugBank/Aciclovir_ddi.xml')
nestdict2 = x2d.parse(xmlstr)
在上面的示例中,nestdict1
抛出错误,而 nestdict2
很好,尽管 xmlstr
是从文件 'Train/DrugBank/Aciclovir_ddi.xml'
直接复制和粘贴的
您需要传递一个文件对象,而不是作为文件名的字符串。
来自docs:
In [4]:print(xmltodict.parse.__doc__)
Parse the given XML input and convert it into a dictionary.
`xml_input` can either be a `string` or a file-like object.
因此,创建一个文件描述符,如:
fd = open("Train/DrugBank/Aciclovir_ddi.xml")
然后传递给parse方法:
x2d.parse(fd)
我正在尝试使用 xml2dict
来解析大量 xml 文件,以便我可以将它们转换为数据帧,但是,当我尝试解析实际的 xml文件我收到错误:
"ExpatError: not well-formed (invalid token): line 1, column 5"
对于所有 xml 文件,包括 "line 1, column 5",此错误完全相同,它们的长度差异很大,但结构完全相同。
当我尝试将 xml 文件的内容复制为 python 中的字符串时,xml2dict 的解析工作完美。例如:
xmlstr ="""<?xml version="1.0" encoding="utf-8"?>
<document id="DDI-DrugBank.d200">
<sentence id="DDI-DrugBank.d200.s0" text="Co-administration of probenecid with acyclovir has been shown to increase the mean half-life and the area under the concentration-time curve.">
<entity id="DDI-DrugBank.d200.s0.e0" charOffset="21-30"
type="drug" text="probenecid"/>
<entity id="DDI-DrugBank.d200.s0.e1" charOffset="37-45"
type="drug" text="acyclovir"/>
<pair id="DDI-DrugBank.d200.s0.p0" e1="DDI-DrugBank.d200.s0.e0"
e2="DDI-DrugBank.d200.s0.e1" ddi="true" type="mechanism"/>
</sentence>
<sentence id="DDI-DrugBank.d200.s1" text="Urinary excretion and renal clearance were correspondingly reduced."/>
<sentence id="DDI-DrugBank.d200.s2" text="The clinical effects of this combination have not been studied."/>
</document>"""
import xmltodict as x2d
nestdict1 = x2d.parse('Train/DrugBank/Aciclovir_ddi.xml')
nestdict2 = x2d.parse(xmlstr)
在上面的示例中,nestdict1
抛出错误,而 nestdict2
很好,尽管 xmlstr
是从文件 'Train/DrugBank/Aciclovir_ddi.xml'
您需要传递一个文件对象,而不是作为文件名的字符串。
来自docs:
In [4]:print(xmltodict.parse.__doc__)
Parse the given XML input and convert it into a dictionary.
`xml_input` can either be a `string` or a file-like object.
因此,创建一个文件描述符,如:
fd = open("Train/DrugBank/Aciclovir_ddi.xml")
然后传递给parse方法:
x2d.parse(fd)