使用 Python 解析 XML 并创建 excel 报告 - Elementree/lxml
Parsing XML using Python and create an excel report - Elementree/lxml
我正在尝试解析许多 XML 测试结果文件并将必要的数据(如测试用例名称、测试结果、失败消息等)获取为 excel 格式。我决定选择 Python.
我的XML文件很大,格式如下。失败的案例有一条消息, & 而通过的案例只有 .
我的要求是创建一个 excel,其中包含测试用例名称、测试状态(pass/fail)、测试失败消息。
<?xml version="1.0" encoding="UTF-8"?>
<testsuites xmlns:a="http://microsoft.com/schemas/VisualStudio/TeamTest/2006"
xmlns:b="http://microsoft.com/schemas/VisualStudio/TeamTest/2010">
<testsuite name="MSTestSuite" tests="192" time="0" failures="16" errors="0" skipped="0">
<testcase classname="dfsgsgg" name="Results are displayed" time="27.8096966">
<failure message="unknown error: jQuery is not defined
">
</failure>
<system-out>Given the user is on the landing page
-> error: unknown error: jQuery is not defined
</system-out>
<system-err>unknown error: jQuery is not defined
</system-err>
</testcase>
<testcase classname="dfsgsgg" name="Results are displayed" time="27.8096966">
<failure message="unknown error: jQuery is not defined
">
</failure>
<system-out>Given the user is on the landing page
-> error: unknown error: jQuery is not defined
</system-out>
<system-err>unknown error: jQuery is not defined
</system-err>
</testcase>
<testcase classname="dfsgsgg" name="Results are displayed" time="27.8096966">
<failure message="unknown error: jQuery is not defined
">
</failure>
<system-out>Given the user is on the landing page
-> error: unknown error: jQuery is not defined
</system-out>
<system-err>unknown error: jQuery is not defined
</system-err>
</testcase>
<testcase classname="dfsgsgg" name="Results are displayed" time="27.8096966">
<system-out>Given the user is on the landing page
-> error: unknown error: jQuery is not defined
</system-out>
</testcase>
</testsuite>
</testsuites>
我想出了下面的代码。如果有任何基本错误,请原谅,因为我对此很陌生。使用此代码,我可以检索测试用例名称、class 名称,但我无法选择失败消息、系统输出和系统错误。虽然这些标签也是测试用例标签的一部分,但我无法获取它。有人可以帮我解决这个问题吗?谢谢!
只有测试用例名称和 class 名称,我可以写入 excel.
## Parsing XML files ###
import os
import pandas as pd
from lxml import etree
df_reports = pd.DataFrame()
df = pd.DataFrame()
i = 0
pass_count = 0
fail_count = 0
path = '/TestReports_Backup/'
files = os.listdir(path)
print(len(files))
for file in files:
file_path = path+file
print(file_path)
tree = etree.parse(file_path)
testcases = tree.xpath('.//testcase')
systemout = tree.xpath('.//testcase/system-out')
failure = tree.xpath('.//testcase/failure')
for testcase in testcases:
test = {}
test['TestCaseName'] = testcase.attrib['name']
test['Classname'] = testcase.attrib['classname']
test['TestStatus'] = failure.attrib['message']
df = pd.DataFrame(test, index=[i])
i = i + 1
df_reports = pd.concat([df_reports, df])
print(df_reports)
df.head()
df_reports.to_csv('/TestReports_Backup/Reports.csv')
由于您的 XML 相对平坦,请考虑使用 list/dictionary 理解来检索所有子元素和 attrib
字典。从那里,在循环外调用 pd.concat
一次。下面运行 dictionary merge (Python 3.5+).
path = "/TestReports_Backup"
def proc_xml(file_path):
tree = etree.parse(os.path.join(path, file_path))
data = [
{ **n.attrib,
**{k:v for el in n.xpath("*") for k,v in el.attrib.items()},
**{el.tag: el.text.strip() for el in n.xpath("*") if el.text.strip()!=''}
} for n in tree.xpath("//testcase")
] v
return pd.DataFrame(data)
df_reports = pd.concat([
proc_xml(f)
for f in os.listdir(path)
if f.endswith(".xml")
])
输出
classname name time message system-out system-err
0 dfsgsgg Results are displayed 27.8096966 unknown error: jQuery is not defined\n Given the user is on the landing page\n -... unknown error: jQuery is not defined
1 dfsgsgg Results are displayed 27.8096966 unknown error: jQuery is not defined\n Given the user is on the landing page\n -... unknown error: jQuery is not defined
2 dfsgsgg Results are displayed 27.8096966 unknown error: jQuery is not defined\n Given the user is on the landing page\n -... unknown error: jQuery is not defined
3 dfsgsgg Results are displayed 27.8096966 NaN Given the user is on the landing page\n -... NaN
此外,从 Pandas v1.3 开始,现在有可用的 read_xml
(默认解析器为 lxml
,默认检索特定 xpath 中的所有属性和子元素) :
path = "/TestReports_Backup"
df_reports = pd.concat([
pd.read_xml(os.path.join(path, f), xpath="//testcase")
for f in os.listdir(path)
if f.endswith(".xml")
])
我正在尝试解析许多 XML 测试结果文件并将必要的数据(如测试用例名称、测试结果、失败消息等)获取为 excel 格式。我决定选择 Python.
我的XML文件很大,格式如下。失败的案例有一条消息, & 而通过的案例只有 . 我的要求是创建一个 excel,其中包含测试用例名称、测试状态(pass/fail)、测试失败消息。
<?xml version="1.0" encoding="UTF-8"?>
<testsuites xmlns:a="http://microsoft.com/schemas/VisualStudio/TeamTest/2006"
xmlns:b="http://microsoft.com/schemas/VisualStudio/TeamTest/2010">
<testsuite name="MSTestSuite" tests="192" time="0" failures="16" errors="0" skipped="0">
<testcase classname="dfsgsgg" name="Results are displayed" time="27.8096966">
<failure message="unknown error: jQuery is not defined
">
</failure>
<system-out>Given the user is on the landing page
-> error: unknown error: jQuery is not defined
</system-out>
<system-err>unknown error: jQuery is not defined
</system-err>
</testcase>
<testcase classname="dfsgsgg" name="Results are displayed" time="27.8096966">
<failure message="unknown error: jQuery is not defined
">
</failure>
<system-out>Given the user is on the landing page
-> error: unknown error: jQuery is not defined
</system-out>
<system-err>unknown error: jQuery is not defined
</system-err>
</testcase>
<testcase classname="dfsgsgg" name="Results are displayed" time="27.8096966">
<failure message="unknown error: jQuery is not defined
">
</failure>
<system-out>Given the user is on the landing page
-> error: unknown error: jQuery is not defined
</system-out>
<system-err>unknown error: jQuery is not defined
</system-err>
</testcase>
<testcase classname="dfsgsgg" name="Results are displayed" time="27.8096966">
<system-out>Given the user is on the landing page
-> error: unknown error: jQuery is not defined
</system-out>
</testcase>
</testsuite>
</testsuites>
我想出了下面的代码。如果有任何基本错误,请原谅,因为我对此很陌生。使用此代码,我可以检索测试用例名称、class 名称,但我无法选择失败消息、系统输出和系统错误。虽然这些标签也是测试用例标签的一部分,但我无法获取它。有人可以帮我解决这个问题吗?谢谢! 只有测试用例名称和 class 名称,我可以写入 excel.
## Parsing XML files ###
import os
import pandas as pd
from lxml import etree
df_reports = pd.DataFrame()
df = pd.DataFrame()
i = 0
pass_count = 0
fail_count = 0
path = '/TestReports_Backup/'
files = os.listdir(path)
print(len(files))
for file in files:
file_path = path+file
print(file_path)
tree = etree.parse(file_path)
testcases = tree.xpath('.//testcase')
systemout = tree.xpath('.//testcase/system-out')
failure = tree.xpath('.//testcase/failure')
for testcase in testcases:
test = {}
test['TestCaseName'] = testcase.attrib['name']
test['Classname'] = testcase.attrib['classname']
test['TestStatus'] = failure.attrib['message']
df = pd.DataFrame(test, index=[i])
i = i + 1
df_reports = pd.concat([df_reports, df])
print(df_reports)
df.head()
df_reports.to_csv('/TestReports_Backup/Reports.csv')
由于您的 XML 相对平坦,请考虑使用 list/dictionary 理解来检索所有子元素和 attrib
字典。从那里,在循环外调用 pd.concat
一次。下面运行 dictionary merge (Python 3.5+).
path = "/TestReports_Backup"
def proc_xml(file_path):
tree = etree.parse(os.path.join(path, file_path))
data = [
{ **n.attrib,
**{k:v for el in n.xpath("*") for k,v in el.attrib.items()},
**{el.tag: el.text.strip() for el in n.xpath("*") if el.text.strip()!=''}
} for n in tree.xpath("//testcase")
] v
return pd.DataFrame(data)
df_reports = pd.concat([
proc_xml(f)
for f in os.listdir(path)
if f.endswith(".xml")
])
输出
classname name time message system-out system-err
0 dfsgsgg Results are displayed 27.8096966 unknown error: jQuery is not defined\n Given the user is on the landing page\n -... unknown error: jQuery is not defined
1 dfsgsgg Results are displayed 27.8096966 unknown error: jQuery is not defined\n Given the user is on the landing page\n -... unknown error: jQuery is not defined
2 dfsgsgg Results are displayed 27.8096966 unknown error: jQuery is not defined\n Given the user is on the landing page\n -... unknown error: jQuery is not defined
3 dfsgsgg Results are displayed 27.8096966 NaN Given the user is on the landing page\n -... NaN
此外,从 Pandas v1.3 开始,现在有可用的 read_xml
(默认解析器为 lxml
,默认检索特定 xpath 中的所有属性和子元素) :
path = "/TestReports_Backup"
df_reports = pd.concat([
pd.read_xml(os.path.join(path, f), xpath="//testcase")
for f in os.listdir(path)
if f.endswith(".xml")
])