如何在 Python 的 xml 文件中找到特定标签?

how do I find specific tag in xml file in Python?

我有一个 XML 文件,我试图在其中找到一个特定的标签。但标签的雇佣顺序不同。我尝试找到标签“MotionVectore”,然后计算特定帧类型(P、B 或 I 帧)的平均运动矢量值。在下面我放了这个 XML 文件的一部分:

<Picture id="1" poc="1">
    <GOPNr>0</GOPNr>
    <SubPicture structure="0">
        <Slice num="0">
            <Type>0</Type>
            <TypeString>SLICE_TYPE_P</TypeString>
            <NAL>
                <Num>5</Num>
                <Type>1</Type>
                <TypeString>NALU_TYPE_SLICE</TypeString>
                <Length>47048</Length>
            </NAL>
            <MacroBlock num="0">
                <MotionVector list="0">
                    <RefIdx>0</RefIdx>
                    <Difference>
                        <X>184</X>
                        <Y>149</Y>
                    </Difference>
                    <Absolute>
                        <X>184</X>
                        <Y>149</Y>
                    </Absolute>
                </MotionVector>
                <MotionVector list="0">
                    <RefIdx>0</RefIdx>
                    <Difference>
                        <X>10</X>
                        <Y>0</Y>
                    </Difference>
                    <Absolute>
                        <X>194</X>
                        <Y>149</Y>
                    </Absolute>
                </MotionVector>
                <Position>
                    <X>0</X>
                    <Y>0</Y>
                </Position>
                <QP_Y>21</QP_Y>
                <Type>1</Type>
                <TypeString>P_L0_L0_16x8</TypeString>
                <PredModeString>BLOCK_TYPE_P</PredModeString>
                <SkipFlag>0</SkipFlag>
            </MacroBlock>
            <MacroBlock num="1">
                <SubMacroBlock num="0">
                    <Type>0</Type>
                    <TypeString>P_L0_8x8</TypeString>
                    <MotionVector list="0">
                        <RefIdx>0</RefIdx>
                        <Difference>
                            <X>8</X>
                            <Y>-1</Y>
                        </Difference>
                        <Absolute>
                            <X>192</X>
                            <Y>148</Y>
                        </Absolute>
                    </MotionVector>
                </SubMacroBlock>
            </MacroBlock>
         </Slice>
        </SubPicture>
</Picture>

如您所见,实现 XY 值的标签顺序是 Picture/SubPicture/Slice/MacroBlock/MotionVector/Absolute/X,但有时这个顺序是 Picture/SubPicture/Slice/MacroBlock/SubMacroBlock/MotionVector/Absolute/X 所以当我使用此代码

 abs_x_tag=list(qpy_node.text for qpy_node in root.findall('Picture/SubPicture/Slice/MacroBlock/SubMacroBlock/MotionVector/Absolute/X'))

要提取所有 X 值,它无法提取所有 X 值,而且我必须根据此标签计算不同帧类型的运动矢量

<TypeString>SLICE_TYPE_P</TypeString>

并且基于这些限制,我不知道如何分别提取每种帧类型的 XY 值。我可以使用上述代码提取所有 XY 值,但我不知道如何根据帧类型找到这些值。你能帮我解决这个问题吗?谢谢。

这是一个例子,你如何用 BeautifulSoup

解析这个 xml

正在安装 BeautifulSoup 和 lxml

pip install BeautifulSoup4 lxml

代码:

from bs4 import BeautifulSoup


XML = """
<Picture id="1" poc="1">
        <GOPNr>0</GOPNr>
        <SubPicture structure="0">
            <Slice num="0">
                <Type>0</Type>
                <TypeString>SLICE_TYPE_P</TypeString>
                <NAL>
                    <Num>5</Num>
                    <Type>1</Type>
                    <TypeString>NALU_TYPE_SLICE</TypeString>
                    <Length>47048</Length>
                </NAL>
                <MacroBlock num="0">
                    <MotionVector list="0">
                        <RefIdx>0</RefIdx>
                        <Difference>
                            <X>184</X>
                            <Y>149</Y>
                        </Difference>
                        <Absolute>
                            <X>184</X>
                            <Y>149</Y>
                        </Absolute>
                    </MotionVector>
                </MacroBlock>
            </Slice>
        </SubPicture>
</Picture>
"""

soup = BeautifulSoup(XML, 'xml')

slices = soup.find_all('Slice')
for slice in slices:
    type = slice.find('TypeString').text
    print(f"Type: {type}")
    vectors = slice.find_all('MotionVector')
    for vector in vectors:
        print("Vector:")
        difference = vector.find('Difference')
        difference_x = difference.find('X').text
        difference_y = difference.find('Y').text

        absolute = vector.find('Absolute')
        absolute_x = absolute.find('X').text
        absolute_y = absolute.find('Y').text

        # Here you know type and x, y and type

        print(f"Difference: {difference_x}, {difference_y}")
        print(f"Absolute: {absolute_x}, {absolute_y}")

输出:

Type: SLICE_TYPE_P
Vector:
Difference: 184, 149
Absolute: 184, 149

我们可以用简单的方式来做,看看下面的输出:

import xml.etree.ElementTree as ET

SampleXML = """
<Picture id="1" poc="1">
        <GOPNr>0</GOPNr>
        <SubPicture structure="0">
            <Slice num="0">
                <Type>0</Type>
                <TypeString>SLICE_TYPE_P</TypeString>
                <NAL>
                    <Num>5</Num>
                    <Type>1</Type>
                    <TypeString>NALU_TYPE_SLICE</TypeString>
                    <Length>47048</Length>
                </NAL>
                <MacroBlock num="0">
                    <MotionVector list="0">
                        <RefIdx>0</RefIdx>
                        <Difference>
                            <X>184</X>
                            <Y>149</Y>
                        </Difference>
                        <Absolute>
                            <X>184</X>
                            <Y>149</Y>
                        </Absolute>
                    </MotionVector>
                </MacroBlock>
            </Slice>
        </SubPicture>
</Picture>
"""
# use below commented lines if you are reading from xml file and replace XMl absolute path with <InputXML>
# tree = ET.parse(r"<InputXML>")
# root = tree.getroot()
root = ET.fromstring(SampleXML)
TypeString = root.findall("./SubPicture/Slice/TypeString")
print("TypeString: ", TypeString[0].text)
abs_x_tag = root.findall("./SubPicture/Slice/MacroBlock/MotionVector/Absolute/X") or root.findall("./SubPicture/Slice/MacroBlock/SubMacroBlock/MotionVector/Absolute/X")
print("abs_x_tag: ", abs_x_tag[0].text)

输出:

类型字符串:SLICE_TYPE_P

abs_x_tag: 184