如何将此 xml 数据解析为 python 中的 table
how to parse this xml data into table in python
我有以下 xml,我想将其解析为 table。我一直在环顾四周,没有找到好的答案。难的部分是:
- 头和数据在不同的子树中
- 所有内部标签都具有相同的名称(th 或 td)
Vaccine
Date
Status
Dose
Route
Site
Comment
ID
Vaccine A
Mon,Mar 15,2019
Done
imm.
Vaccine B
Tue,Sep 20, 2019
Done
imm.
<ns0:text xmlns:ns0="urn:hl7-org:v3">
<ns0:table border="1" width="100%">
<ns0:thead>
<ns0:tr>
<ns0:th>Vaccine</ns0:th>
<ns0:th>Date</ns0:th>
<ns0:th>Status</ns0:th>
<ns0:th>Dose</ns0:th>
<ns0:th>Route</ns0:th>
<ns0:th>Site</ns0:th>
<ns0:th>Comment</ns0:th>
</ns0:tr>
</ns0:thead>
<ns0:tbody>
<ns0:tr>
<ns0:td>
<ns0:content ID="immunizationDescription1">Vaccin A</ns0:content>
</ns0:td>
<ns0:td>Monday, March 15, 2019 at 4:46:00 pm</ns0:td>
<ns0:td>Done</ns0:td>
<ns0:td>
</ns0:td>
<ns0:td />
<ns0:td />
<ns0:td />
</ns0:tr>
<ns0:tr>
<ns0:td>
<ns0:content ID="immunizationDescription2">Vaccine B</ns0:content>
</ns0:td>
<ns0:td>Tuesday, September 20, 2019 at 12:00:00 am</ns0:td>
<ns0:td>Done</ns0:td>
<ns0:td>
</ns0:td>
<ns0:td />
<ns0:td />
<ns0:td />
</ns0:tr>
</ns0:tbody>
</ns0:table>
</ns0:text>
只要你照顾好你的命名空间,你应该可以接受这样的东西,虽然它有点复杂:
from lxml import etree
nsmap = {"ns0": "urn:hl7-org:v3"}
rows = []
cols = doc.xpath('//ns0:thead//ns0:tr//ns0:th/text()', namespaces=nsmap)
cols.append("ID")
for p in doc.xpath('//ns0:tbody//ns0:tr', namespaces=nsmap):
vaccine = p.xpath('.//ns0:content/text()', namespaces=nsmap)[0]
id = p.xpath('.//ns0:content/@ID', namespaces=nsmap)[0]
date = p.xpath('substring-before(.//ns0:td[position()=2]/text()," at")', namespaces=nsmap)
status = p.xpath('.//ns0:td[position()>2]', namespaces=nsmap)
row = []
row.extend([vaccine,date])
row.extend([sta.text.strip() if sta.text else "" for sta in status])
#you could combine the previous two lines into one, but that would make it somewhat less readable
row.append(id)
rows.append(row)
输出(请原谅格式):
Vaccine Date Status Dose Route Site Comment ID
0 Vaccin A Monday, March 15, 2019 Done immunizationDescription1
1 Vaccine B Tuesday, September 20, 2019 Done immunizationDescription2
我有以下 xml,我想将其解析为 table。我一直在环顾四周,没有找到好的答案。难的部分是:
- 头和数据在不同的子树中
- 所有内部标签都具有相同的名称(th 或 td)
Vaccine | Date | Status | Dose | Route | Site | Comment | ID |
---|---|---|---|---|---|---|---|
Vaccine A | Mon,Mar 15,2019 | Done | imm. | ||||
Vaccine B | Tue,Sep 20, 2019 | Done | imm. |
<ns0:text xmlns:ns0="urn:hl7-org:v3">
<ns0:table border="1" width="100%">
<ns0:thead>
<ns0:tr>
<ns0:th>Vaccine</ns0:th>
<ns0:th>Date</ns0:th>
<ns0:th>Status</ns0:th>
<ns0:th>Dose</ns0:th>
<ns0:th>Route</ns0:th>
<ns0:th>Site</ns0:th>
<ns0:th>Comment</ns0:th>
</ns0:tr>
</ns0:thead>
<ns0:tbody>
<ns0:tr>
<ns0:td>
<ns0:content ID="immunizationDescription1">Vaccin A</ns0:content>
</ns0:td>
<ns0:td>Monday, March 15, 2019 at 4:46:00 pm</ns0:td>
<ns0:td>Done</ns0:td>
<ns0:td>
</ns0:td>
<ns0:td />
<ns0:td />
<ns0:td />
</ns0:tr>
<ns0:tr>
<ns0:td>
<ns0:content ID="immunizationDescription2">Vaccine B</ns0:content>
</ns0:td>
<ns0:td>Tuesday, September 20, 2019 at 12:00:00 am</ns0:td>
<ns0:td>Done</ns0:td>
<ns0:td>
</ns0:td>
<ns0:td />
<ns0:td />
<ns0:td />
</ns0:tr>
</ns0:tbody>
</ns0:table>
</ns0:text>
只要你照顾好你的命名空间,你应该可以接受这样的东西,虽然它有点复杂:
from lxml import etree
nsmap = {"ns0": "urn:hl7-org:v3"}
rows = []
cols = doc.xpath('//ns0:thead//ns0:tr//ns0:th/text()', namespaces=nsmap)
cols.append("ID")
for p in doc.xpath('//ns0:tbody//ns0:tr', namespaces=nsmap):
vaccine = p.xpath('.//ns0:content/text()', namespaces=nsmap)[0]
id = p.xpath('.//ns0:content/@ID', namespaces=nsmap)[0]
date = p.xpath('substring-before(.//ns0:td[position()=2]/text()," at")', namespaces=nsmap)
status = p.xpath('.//ns0:td[position()>2]', namespaces=nsmap)
row = []
row.extend([vaccine,date])
row.extend([sta.text.strip() if sta.text else "" for sta in status])
#you could combine the previous two lines into one, but that would make it somewhat less readable
row.append(id)
rows.append(row)
输出(请原谅格式):
Vaccine Date Status Dose Route Site Comment ID
0 Vaccin A Monday, March 15, 2019 Done immunizationDescription1
1 Vaccine B Tuesday, September 20, 2019 Done immunizationDescription2