通过 python 将 xml 转换为 csv

convert xml to csv by python

我的朋友

在下面的代码中,我尝试将 XML (https://issat.ttn.tn/cu/export/akouda.php) 转换为 CSV 文件,

代码:

import requests
import xml.etree.ElementTree as Xet
import pandas as pd
from html import unescape
url = "https://issat.ttn.tn/cu/export/akouda.php"

s = unescape(requests.get(url).text)[5:-6]

df = pd.read_xml(s, xpath="//phases/* | //time")#
#df["value"] = df["value"].ffill()
df
df.to_csv('output0.csv')

这里是一些结果:

,value,phases,id,act_energy,react_energy,current_inst,voltage_inst,power_inst,power_fact,thd
0,2022-04-14 15:45:00,,,,,,,,,
1,,,0.0,0.3000000000001819,0.4324445747717669,2.0,241.7,0.27,0.57,27.39
2,,,1.0,0.0,0.0,13.06,242.5,0.66,0.2,22.69
3,,,2.0,0.0,0.0,1.07,243.7,0.15,0.58,48.05
4,2022-04-14 15:30:00,,,,,,,,,
5,,,0.0,0.2999999999999545,0.108885460271677,1.02,240.4,0.23,0.94,23.7
6,,,1.0,0.0,0.0,14.54,241.0,0.86,0.24,23.99
7,,,2.0,0.0,0.0,1.07,243.5,0.15,0.59,48.08
8,2022-04-14 15:15:00,,,,,,,,,
9,,,0.0,0.3999999999998636,0.5618044649492236,0.7,243.1,0.1,0.58,42.46
10,,,1.0,0.0,0.0,17.82,241.9,1.99,0.46,33.59
11,,,2.0,0.0,0.0,1.08,246.3,0.15,0.58,51.09
12,2022-04-14 15:00:00,,,,,,,,,
13,,,0.0,0.6000000000001364,0.8427066974243144,0.71,241.7,0.1,0.58,44.02
14,,,1.0,0.0,0.0,18.74,240.5,2.21,0.49,31.3
15,,,2.0,0.0,0.0,1.08,245.3,0.15,0.58,51.77

我需要:

  1. 删除像第 0 & 4 & 8 & 12 行这样有日期但没有读数的行。
  2. 仅获取 id = 1 的行。
  3. 删除阶段列。

拜托,有人可以帮忙吗?

尝试:

import requests
import pandas as pd
from html import unescape

url = "https://issat.ttn.tn/cu/export/akouda.php"

s = unescape(requests.get(url).text)[5:-6]

df = pd.read_xml(s, xpath="//phases/* | //time")

df["value"] = df["value"].ffill()
df = df.drop(columns="phases")
# if you want only id==1 you can skip this:
# df = df[~df.isna().any(axis=1)]
print(df[df["id"] == 1])

打印:

                    value   id  act_energy  react_energy  current_inst  voltage_inst  power_inst  power_fact    thd
2     2022-04-14 23:15:00  1.0         0.0           0.0         12.06         241.0        0.83        0.28  22.56
6     2022-04-14 23:00:00  1.0         0.0           0.0         12.04         240.5        0.82        0.28  22.57
10    2022-04-14 22:45:00  1.0         0.0           0.0         12.04         240.2        0.82        0.28  22.56
14    2022-04-14 22:30:00  1.0         0.0           0.0         12.03         240.1        0.82        0.28  22.24
18    2022-04-14 22:15:00  1.0         0.0           0.0         12.01         240.1        0.82        0.28  22.52
22    2022-04-14 22:00:00  1.0         0.0           0.0         12.00         239.8        0.82        0.28  22.74
26    2022-04-14 21:45:00  1.0         0.0           0.0         11.96         239.9        0.82        0.28  22.58

...

考虑 运行 两个 read_xml 调用,调整 xpath 并使用 attrs_only。因为两者将处于同一水平(一个 <phases>@id=1 对一个 <time>),join 结果:

...
time_df = pd.read_xml(s, xpath="//time", attrs_only=True, names=["time"])
phase_df = pd.read_xml(s, xpath="//phase[@id=1]")

time_phase_df = time_df.join(phase_df)
time_phase_df
                     time  id  act_energy  ...  power_inst  power_fact    thd
0     2022-04-15 00:00:00   1           0  ...        0.84        0.28  22.35
1     2022-04-14 23:45:00   1           0  ...        0.83        0.28  23.16
2     2022-04-14 23:30:00   1           0  ...        0.83        0.28  22.43
3     2022-04-14 23:15:00   1           0  ...        0.83        0.28  22.56
4     2022-04-14 23:00:00   1           0  ...        0.82        0.28  22.57
                  ...  ..         ...  ...         ...         ...    ...
1289  2022-04-01 02:15:00   1           0  ...        0.69        0.25  22.70
1290  2022-04-01 02:00:00   1           0  ...        0.69        0.25  22.66
1291  2022-04-01 01:45:00   1           0  ...        0.69        0.25  22.46
1292  2022-04-01 01:30:00   1           0  ...        0.69        0.25  22.00
1293  2022-04-01 01:25:00   1           0  ...        0.69        0.25  22.34

并且即将在 Pandas 1.5 中推出,read_xml 将支持解析日期:

time_df = pd.read_xml(
    s, xpath="//time", attrs_only=True, names=["time"], parse_dates=["value"]
)