Minidom——提取子节点
Minidom - extracting sub-nodes
我有一些 XML:
<sentence id="1086415:2">
<text> and there is much tasty food, all of it fresh and continually refilled.</text>
<Opinions>
<Opinion to="31" from="27" polarity="positive" category="FOOD#STYLE_OPTIONS" target="food"/>
<Opinion to="31" from="27" polarity="positive" category="FOOD#QUALITY" target="food"/>
<Opinion to="31" from="27" polarity="positive" category="FOOD#PRICES" target="food"/>
</Opinions>
</sentence>
<sentence id="1086415:3">
<text>I am not a vegetarian but, almost all the dishes were great.</text>
<Opinions>
<Opinion to="48" from="42" polarity="positive" category="FOOD#QUALITY" target="dishes"/>
</Opinions>
我正在尝试提取 Opinions 标签中的所有内容,以将其与元组中的文本结合起来。我想知道如何用 minidom 做到这一点?目前意见returns'\n'.
from xml.dom import minidom
xmldoc = minidom.parse("ABSA16_Restaurants_Train_SB1_v2.xml")
sentences = xmldoc.getElementsByTagName("sentence")
for sentence in sentences:
text = sentence.getElementsByTagName("text")[0].firstChild.data
opinion = sentence.getElementsByTagName("Opinions")[0].firstChild.data
谢谢。
您确定需要minidom
吗?
来自文档:
Users who are not already proficient with the DOM should consider
using the xml.etree.ElementTree module for their XML processing
instead.
没有充分的理由不要浪费您的时间并使用标准 python xml.etree.ElementTree
,它的手册中有足够的示例来解决您的任务。如果遇到问题,请随时在评论中提问。
除此之外,如果您需要经常使用 XML,我建议 third-party lxml
,它是包含一些电池的更强大的工具。
我有一些 XML:
<sentence id="1086415:2">
<text> and there is much tasty food, all of it fresh and continually refilled.</text>
<Opinions>
<Opinion to="31" from="27" polarity="positive" category="FOOD#STYLE_OPTIONS" target="food"/>
<Opinion to="31" from="27" polarity="positive" category="FOOD#QUALITY" target="food"/>
<Opinion to="31" from="27" polarity="positive" category="FOOD#PRICES" target="food"/>
</Opinions>
</sentence>
<sentence id="1086415:3">
<text>I am not a vegetarian but, almost all the dishes were great.</text>
<Opinions>
<Opinion to="48" from="42" polarity="positive" category="FOOD#QUALITY" target="dishes"/>
</Opinions>
我正在尝试提取 Opinions 标签中的所有内容,以将其与元组中的文本结合起来。我想知道如何用 minidom 做到这一点?目前意见returns'\n'.
from xml.dom import minidom
xmldoc = minidom.parse("ABSA16_Restaurants_Train_SB1_v2.xml")
sentences = xmldoc.getElementsByTagName("sentence")
for sentence in sentences:
text = sentence.getElementsByTagName("text")[0].firstChild.data
opinion = sentence.getElementsByTagName("Opinions")[0].firstChild.data
谢谢。
您确定需要minidom
吗?
来自文档:
Users who are not already proficient with the DOM should consider using the xml.etree.ElementTree module for their XML processing instead.
没有充分的理由不要浪费您的时间并使用标准 python xml.etree.ElementTree
,它的手册中有足够的示例来解决您的任务。如果遇到问题,请随时在评论中提问。
除此之外,如果您需要经常使用 XML,我建议 third-party lxml
,它是包含一些电池的更强大的工具。