使用带有 sax 的 Python 解析具有多个元素的 XML
Parse an XML with multiple elements using Python with sax
我正在尝试使用 SAX Python 解析 XML 文件。
该文档有多个同名元素。我想打印元素的一些属性,但是程序只是打印文档中最后遇到的元素的属性。
这是代码:
# art.py
import sys
from xml.sax import make_parser
from handlers import ArticleHandler
ch = ArticleHandler( )
saxparser = make_parser( )
saxparser.setContentHandler(ch)
saxparser.parse(sys.stdin)
print "TYPE:", ch.TYPE
print "SUBTYPE:" , ch.SUBTYPE
# handlers.py
from xml.sax.handler import ContentHandler
class ArticleHandler(ContentHandler):
TYPE = ""
SUBTYPE = ""
def startElement(self, name, attrs):
if name == "relation":
self.TYPE = attrs.get("TYPE", "")
self.SUBTYPE = attrs.get("SUBTYPE")
这是XML:
<relation ID="CNN_CF_20030303.1900.00-R3" TYPE="ORG-AFF" SUBTYPE="Employment">
...
</relation>
<relation ID="CNN_CF_20030303.1900.00-R4" TYPE="ORG-AFF" SUBTYPE="Membership">
...
</relation>
对于这个输入,输出是
TYPE:ORG-AFF
SUBTYPE:Membership
而预期输出是
TYPE:ORG-AFF
SUBTYPE:Employment
TYPE:ORG-AFF
SUBTYPE:Membership
我该如何修复这个错误?
你必须重写你的程序来处理多个关系标签,例如通过使用列表
import sys
from xml.sax import make_parser
from xml.sax.handler import ContentHandler
class ArticleHandler(ContentHandler):
def __init__(self):
self.relations = []
def startElement(self, name, attrs):
if name == "relation":
self.relations.append((attrs.get("TYPE", ""), attrs.get("SUBTYPE"))
ch = ArticleHandler()
saxparser = make_parser()
saxparser.setContentHandler(ch)
saxparser.parse(sys.stdin)
for type, subtype in ch.relations:
print "TYPE:", type
print "SUBTYPE:" , subtype
我正在尝试使用 SAX Python 解析 XML 文件。
该文档有多个同名元素。我想打印元素的一些属性,但是程序只是打印文档中最后遇到的元素的属性。
这是代码:
# art.py
import sys
from xml.sax import make_parser
from handlers import ArticleHandler
ch = ArticleHandler( )
saxparser = make_parser( )
saxparser.setContentHandler(ch)
saxparser.parse(sys.stdin)
print "TYPE:", ch.TYPE
print "SUBTYPE:" , ch.SUBTYPE
# handlers.py
from xml.sax.handler import ContentHandler
class ArticleHandler(ContentHandler):
TYPE = ""
SUBTYPE = ""
def startElement(self, name, attrs):
if name == "relation":
self.TYPE = attrs.get("TYPE", "")
self.SUBTYPE = attrs.get("SUBTYPE")
这是XML:
<relation ID="CNN_CF_20030303.1900.00-R3" TYPE="ORG-AFF" SUBTYPE="Employment">
...
</relation>
<relation ID="CNN_CF_20030303.1900.00-R4" TYPE="ORG-AFF" SUBTYPE="Membership">
...
</relation>
对于这个输入,输出是
TYPE:ORG-AFF
SUBTYPE:Membership
而预期输出是
TYPE:ORG-AFF
SUBTYPE:Employment
TYPE:ORG-AFF
SUBTYPE:Membership
我该如何修复这个错误?
你必须重写你的程序来处理多个关系标签,例如通过使用列表
import sys
from xml.sax import make_parser
from xml.sax.handler import ContentHandler
class ArticleHandler(ContentHandler):
def __init__(self):
self.relations = []
def startElement(self, name, attrs):
if name == "relation":
self.relations.append((attrs.get("TYPE", ""), attrs.get("SUBTYPE"))
ch = ArticleHandler()
saxparser = make_parser()
saxparser.setContentHandler(ch)
saxparser.parse(sys.stdin)
for type, subtype in ch.relations:
print "TYPE:", type
print "SUBTYPE:" , subtype