使用 Python 在 xml 中的节点之前解析注释

Question

这是我从中解析数据的 xml 节点示例的格式：

<!-- /StationName/BACnetTemp/MNB_1_HX/HiPressureAlarm -->
<node name="HiPressureAlarm" class="tridium.control.BinaryInputNode" module="coreRuntime" release="2.301.535.v1">
  <properties>
    <position><x>576</x><y>866</y></position>
    <timeDelay>
      <duration>60</duration>
    </timeDelay>
    <eventEnable>
      <toOffnormal>true</toOffnormal>
      <toFault>false</toFault>
      <toNormal>true</toNormal>
    </eventEnable>
    <alarmText>MCD Basement Re-Heat High Pressure Alarm</alarmText>
    <changeOfStateTime>2018-05-07T08:55:04.09-4</changeOfStateTime>
    <changeOfStateCount>848</changeOfStateCount>
    <elapsedActiveTime>
      <duration>126872</duration>
    </elapsedActiveTime>
    <activeInactiveText>
      <active>Alarm</active>
      <inactive>Normal</inactive>
    </activeInactiveText>
    <alarmValueEnabled>true</alarmValueEnabled>
  </properties>
</node>  <!-- HiPressureAlarm -->

开头的注释是我尝试将一些数据导出到 excel 文件的路径。除了能够将路径与我从中提取数据的节点相关联之外，我的其他所有东西都在工作。

我可以使用以下代码将所有评论放入列表中：

comments=soup.findAll(text=lambda text:isinstance(text, Comment))

然后我尝试了以下方法来查找评论并将其与路径相关联：

for comment in comments:
                x='/'+nodeName
                if x in comment:
                    nodePath = comment

问题是有几个同名的节点有不同的路径，所以它给了我每个节点相同的路径。所以我在 for 循环之后立即添加了以下代码：

if nodePath in comments:
                comments.remove(nodePath)

这应该有效，但下一个问题是 xml 中有多个相同评论的实例，并且顺序与查找节点及其数据的顺序不匹配，所以路径与正确的节点不匹配。

有没有办法找到节点，然后将它之前的注释分配给一个变量，然后解析为 excel？

这是我解析数据的完整代码：

def alarms(self,soup):
        alarms=soup.find_all('toOffnormal')
        comments=soup.findAll(text=lambda text:isinstance(text, Comment))
        nodeStartList=[]
        for alarm in alarms:
            nodeStart=alarm.parent.parent.parent
            nodeStartList.append(nodeStart)
        dataList=[]
        for item in nodeStartList:
            nodeName=item['name']
            for comment in comments:
                x='/'+nodeName
                if x in comment:
                    nodePath = comment
            if nodePath in comments:
                comments.remove(nodePath)
            if item.find('timeDelay')!= None:
                timeDelay=item.find('timeDelay').get_text("|", strip=True)
            else:
                timeDelay='0'

            if item.find('eventEnable')!=None:
                toOffnormal=item.find('toOffnormal').get_text("| ", strip=True)
                toFault=item.find('toFault').get_text("| ", strip=True)
                toNormal=item.find('toNormal').get_text("| ", strip=True)
            else:
                toOffnormal='false'
                toFault='false'
                toNormal='false'

            alarmText=item.find('alarmText').get_text("| ", strip=True)


            if item.find('highLimit')!= None:
                highLimit=item.find('highLimit').get_text("| ", strip=True)
            else:
                highLimit='N/A'

            if item.find('lowLimit')!= None:
                lowLimit=item.find('lowLimit').get_text("| ", strip=True)
            else:
                lowLimit='N/A'

            if item.find('deadband'):
                deadband=item.find('deadband').get_text("| ", strip=True)
            else:
                deadband='N/A'

            if item.find('lowLimitEnabled'):
                lowLimitEnabled=item.find('lowLimitEnabled').get_text("| ", strip=True)
            else:
                lowLimitEnabled='false'

            if item.find('highLimitEnabled'):
                highLimitEnabled=item.find('highLimitEnabled').get_text("| ", strip=True)
            else:
                highLimitEnabled='false'

            itemList=[nodeName,nodePath,timeDelay,toOffnormal,toFault,toNormal,alarmText,highLimit,lowLimit,deadband,lowLimitEnabled,highLimitEnabled]
            dataList.append(itemList)

        self.df=pandas.DataFrame(dataList)
        self.df.columns=['pointName','pointPath','timeDelay','toOffnormal','toFault','toNormal','alarmText','highLimit','lowLimit','deadband','lowLimitEnabled','highLimitEnabled']
        return self.df

Answer 1

我可以通过在第 10 行之后插入 nodePath=item.previous_element.previous_element 来实现我的目标。我的结果如下：

def alarms(self,soup):
        alarms=soup.find_all('toOffnormal')
        #comments=soup.findAll(text=lambda text:isinstance(text, Comment))
        nodeStartList=[]
        for alarm in alarms:
            nodeStart=alarm.parent.parent.parent
            nodeStartList.append(nodeStart)
        dataList=[]
        for item in nodeStartList:
            nodeName=item['name']
            nodePath=item.previous_element.previous_element
            #for comment in comments:
                #x='/'+nodeName
                #if x in comment:
                    #nodePath = comment
            #if nodePath in comments:
                #comments.remove(nodePath)
            if item.find('timeDelay')!= None:
                timeDelay=item.find('timeDelay').get_text("|", strip=True)
            else:
                timeDelay='0'

            if item.find('eventEnable')!=None:
                toOffnormal=item.find('toOffnormal').get_text("| ", strip=True)
                toFault=item.find('toFault').get_text("| ", strip=True)
                toNormal=item.find('toNormal').get_text("| ", strip=True)
            else:
                toOffnormal='false'
                toFault='false'
                toNormal='false'

            alarmText=item.find('alarmText').get_text("| ", strip=True)


            if item.find('highLimit')!= None:
                highLimit=item.find('highLimit').get_text("| ", strip=True)
            else:
                highLimit='N/A'

            if item.find('lowLimit')!= None:
                lowLimit=item.find('lowLimit').get_text("| ", strip=True)
            else:
                lowLimit='N/A'

            if item.find('deadband'):
                deadband=item.find('deadband').get_text("| ", strip=True)
            else:
                deadband='N/A'

            if item.find('lowLimitEnabled'):
                lowLimitEnabled=item.find('lowLimitEnabled').get_text("| ", strip=True)
            else:
                lowLimitEnabled='false'

            if item.find('highLimitEnabled'):
                highLimitEnabled=item.find('highLimitEnabled').get_text("| ", strip=True)
            else:
                highLimitEnabled='false'

            itemList=[nodeName,nodePath,timeDelay,toOffnormal,toFault,toNormal,alarmText,highLimit,lowLimit,deadband,lowLimitEnabled,highLimitEnabled]
            dataList.append(itemList)

        self.df=pandas.DataFrame(dataList)
        self.df.columns=['pointName','pointPath','timeDelay','toOffnormal','toFault','toNormal','alarmText','highLimit','lowLimit','deadband','lowLimitEnabled','highLimitEnabled']
        return self.df

使用 Python 在 xml 中的节点之前解析注释

Parse comment before node in xml using Python

xml

excel

parsing

comments

python-3.x