如何使用 lxml 从磁盘加载 .xml 文件作为元素树?
How do I load an .xml file from disk as an element tree using lxml?
我的驱动器上有一系列 XML 个文件,我想对其执行以下操作:
- 作为元素树加载到lxml并用xpath解析
- 加载另一个 XML 文件作为元素树并使用 xpath 解析以找到正确的位置以将信息附加到
- 我从一系列 XML 文件中解析的信息应该设置为变量,这样我就可以 运行 在追加回大文件之前对结果进行一些逻辑处理。xml文件
我在文件类型方面遇到了一些问题/将 XML 文件作为元素树正确加载,以便 lxml 可以对其进行操作。我已经尝试了几种不同的方法,但一直 运行 遇到各种问题。当前问题如下:
TypeError: Argument '_parent' has incorrect type (expected
lxml.etree._Element, got list)
from lxml import etree
from lxml import html
import requests
file = 'bgg.xml'
# parse the xml file from disk as an element tree in lxml?
treebgg = etree.parse(file)
# create a list of IDs to iterate through from the bgg.xml file
gameList = treebgg.xpath("//root/BGG/@ID")
# iterate through the IDs
for x in reversed(gameList):
url = 'https://somewhere.com/xmlapi/' + str(x)
page = requests.get(url)
# pull an xml file from a web url and turn it into an element tree in lxml
tree = html.fromstring(page.content)
# set my root variable so I can append children to this location
root = tree.xpath("//root/BGG[@ID=x]")
name = tree.xpath("//somewhere/name[@primary='true']"
# append child info into bgg.xml
child = etree.SubElement(root, "Name")
child.text = name
# write bgg.xml back to file
获取 bgg.xml
树的根:
rootbgg = treebgg.getroot()
并使用它将 children 附加到:
child = etree.SubElement(rootbgg, "Name")
I'm having another problem...how do I select the correct element? I don't want to append to the root of the xml file itself.
您现在需要重新设计迭代元素的方式:
gameList = treebgg.xpath("//root/BGG")
# iterate through the IDs
for game in reversed(gameList):
url = 'https://somewhere.com/xmlapi/' + game.attrib["id"]
page = requests.get(url)
tree = html.fromstring(page.content)
# TODO: get the name
# append child info into bgg.xml
child = etree.SubElement(game, "Name")
child.text = name
我的驱动器上有一系列 XML 个文件,我想对其执行以下操作:
- 作为元素树加载到lxml并用xpath解析
- 加载另一个 XML 文件作为元素树并使用 xpath 解析以找到正确的位置以将信息附加到
- 我从一系列 XML 文件中解析的信息应该设置为变量,这样我就可以 运行 在追加回大文件之前对结果进行一些逻辑处理。xml文件
我在文件类型方面遇到了一些问题/将 XML 文件作为元素树正确加载,以便 lxml 可以对其进行操作。我已经尝试了几种不同的方法,但一直 运行 遇到各种问题。当前问题如下:
TypeError: Argument '_parent' has incorrect type (expected lxml.etree._Element, got list)
from lxml import etree
from lxml import html
import requests
file = 'bgg.xml'
# parse the xml file from disk as an element tree in lxml?
treebgg = etree.parse(file)
# create a list of IDs to iterate through from the bgg.xml file
gameList = treebgg.xpath("//root/BGG/@ID")
# iterate through the IDs
for x in reversed(gameList):
url = 'https://somewhere.com/xmlapi/' + str(x)
page = requests.get(url)
# pull an xml file from a web url and turn it into an element tree in lxml
tree = html.fromstring(page.content)
# set my root variable so I can append children to this location
root = tree.xpath("//root/BGG[@ID=x]")
name = tree.xpath("//somewhere/name[@primary='true']"
# append child info into bgg.xml
child = etree.SubElement(root, "Name")
child.text = name
# write bgg.xml back to file
获取 bgg.xml
树的根:
rootbgg = treebgg.getroot()
并使用它将 children 附加到:
child = etree.SubElement(rootbgg, "Name")
I'm having another problem...how do I select the correct element? I don't want to append to the root of the xml file itself.
您现在需要重新设计迭代元素的方式:
gameList = treebgg.xpath("//root/BGG")
# iterate through the IDs
for game in reversed(gameList):
url = 'https://somewhere.com/xmlapi/' + game.attrib["id"]
page = requests.get(url)
tree = html.fromstring(page.content)
# TODO: get the name
# append child info into bgg.xml
child = etree.SubElement(game, "Name")
child.text = name