从 .xml 文件中提取熊猫数据框
Extract panda dataframe from .xml file
我有一个包含以下内容的 .xml 文件:
<detailedreport xmlns:xsi="http://"false">
<severity level="5">
<category categoryid="3" categoryname="Buffer Overflow" pcirelated="false">
<cwe cweid="121" cwename="Stack-based Buffer Overflow" pcirelated="false" sans="120" certc="1160">
<description>
<text text="code."/>
</description>
<staticflaws>
<flaw severity="5" categoryname="Stack-based Buffer Overflow" count="1" issueid="6225" module="Jep" type="strcpy" description="This call to strcpy() contains a buffer overflow. The source string has an allocated size of 80 bytes " note="" cweid="121" remediationeffort="2" exploitLevel="0" categoryid="3" pcirelated="false">
<exploitability_adjustments>
<exploitability_adjustment score_adjustment="0">
</exploitability_adjustment>
</exploitability_adjustments>
</flaw>
</staticflaws>
</cwe>
</category>
</severity>
</detailedreport>
下面是 python 程序,用于从“缺陷”标签下的 .xml 文件中提取一些字段。但是当我打印 python 程序中的字段时,它们是空的。
from lxml import etree
root = etree.parse(r'fps_change.xml')
xroot = root.getroot()
df_cols = ["categoryname", "issueid", "module"]
rows = []
for node in xroot:
#s_name = node.attrib.get("name")
s_categoryname = node.find("categoryname")
s_issueid = node.find("issueid")
s_module = node.find("module")
rows.append({"categoryname": s_categoryname,
"issueid": s_issueid, "module": s_module})
out_df = pd.DataFrame(rows, columns=df_cols)
print(out_df) #this prints empty.
预期输出:
Stack-based Buffer Overflow 6225 Jep
我应该对我的程序进行哪些更改才能获得预期的输出。
from bs4 import BeautifulSoup
html_obj = BeautifulSoup(string)
flaw = html_obj.find('flaw')
[flaw[key] for key in df_cols]
['Stack-based Buffer Overflow', '6225', 'Jep']
string = '''
<detailedreport xmlns:xsi="http://"false">
<severity level="5">
<category categoryid="3" categoryname="Buffer Overflow" pcirelated="false">
<cwe cweid="121" cwename="Stack-based Buffer Overflow" pcirelated="false" sans="120" certc="1160">
<description>
<text text="code."/>
</description>
<staticflaws>
<flaw severity="5" categoryname="Stack-based Buffer Overflow" count="1" issueid="6225" module="Jep" type="strcpy" description="This call to strcpy() contains a buffer overflow. The source string has an allocated size of 80 bytes " note="" cweid="121" remediationeffort="2" exploitLevel="0" categoryid="3" pcirelated="false">
<exploitability_adjustments>
<exploitability_adjustment score_adjustment="0">
</exploitability_adjustment>
</exploitability_adjustments>
</flaw>
</staticflaws>
</cwe>
</category>
</severity>
</detailedreport>'''
我有一个包含以下内容的 .xml 文件:
<detailedreport xmlns:xsi="http://"false">
<severity level="5">
<category categoryid="3" categoryname="Buffer Overflow" pcirelated="false">
<cwe cweid="121" cwename="Stack-based Buffer Overflow" pcirelated="false" sans="120" certc="1160">
<description>
<text text="code."/>
</description>
<staticflaws>
<flaw severity="5" categoryname="Stack-based Buffer Overflow" count="1" issueid="6225" module="Jep" type="strcpy" description="This call to strcpy() contains a buffer overflow. The source string has an allocated size of 80 bytes " note="" cweid="121" remediationeffort="2" exploitLevel="0" categoryid="3" pcirelated="false">
<exploitability_adjustments>
<exploitability_adjustment score_adjustment="0">
</exploitability_adjustment>
</exploitability_adjustments>
</flaw>
</staticflaws>
</cwe>
</category>
</severity>
</detailedreport>
下面是 python 程序,用于从“缺陷”标签下的 .xml 文件中提取一些字段。但是当我打印 python 程序中的字段时,它们是空的。
from lxml import etree
root = etree.parse(r'fps_change.xml')
xroot = root.getroot()
df_cols = ["categoryname", "issueid", "module"]
rows = []
for node in xroot:
#s_name = node.attrib.get("name")
s_categoryname = node.find("categoryname")
s_issueid = node.find("issueid")
s_module = node.find("module")
rows.append({"categoryname": s_categoryname,
"issueid": s_issueid, "module": s_module})
out_df = pd.DataFrame(rows, columns=df_cols)
print(out_df) #this prints empty.
预期输出:
Stack-based Buffer Overflow 6225 Jep
我应该对我的程序进行哪些更改才能获得预期的输出。
from bs4 import BeautifulSoup
html_obj = BeautifulSoup(string)
flaw = html_obj.find('flaw')
[flaw[key] for key in df_cols]
['Stack-based Buffer Overflow', '6225', 'Jep']
string = '''
<detailedreport xmlns:xsi="http://"false">
<severity level="5">
<category categoryid="3" categoryname="Buffer Overflow" pcirelated="false">
<cwe cweid="121" cwename="Stack-based Buffer Overflow" pcirelated="false" sans="120" certc="1160">
<description>
<text text="code."/>
</description>
<staticflaws>
<flaw severity="5" categoryname="Stack-based Buffer Overflow" count="1" issueid="6225" module="Jep" type="strcpy" description="This call to strcpy() contains a buffer overflow. The source string has an allocated size of 80 bytes " note="" cweid="121" remediationeffort="2" exploitLevel="0" categoryid="3" pcirelated="false">
<exploitability_adjustments>
<exploitability_adjustment score_adjustment="0">
</exploitability_adjustment>
</exploitability_adjustments>
</flaw>
</staticflaws>
</cwe>
</category>
</severity>
</detailedreport>'''