不规则地将数据导出到具有 NaN 值的 csv 文件
Irregular export data to csv file with NaN values
我在 csv 文件中有此数据:
Date/Time kWh kVArh kVA PF
0 2021-01-01 00:30:00 471.84 0.00 943.6800 1.0000
1 2021-01-01 01:00:00 491.04 1.44 982.0842 1.0000
2 2021-01-01 01:30:00 475.20 0.00 950.4000 1.0000
3 2021-01-01 02:00:00 470.88 0.00 941.7600 1.0000
4 2021-01-01 02:30:00 466.56 0.00 933.1200 1.0000
... ... ... ... ... ...
9223 2021-07-14 04:00:00 1104.00 53.28 2210.5698 0.9988
9224 2021-07-14 04:30:00 1156.30 49.92 2314.7542 0.9991
9225 2021-07-14 05:00:00 1176.00 37.92 2353.2224 0.9995
9226 2021-07-14 05:30:00 1177.00 27.36 2354.6359 0.9997
9227 2021-07-14 06:00:00 1196.60 22.56 2393.6253 0.9998
然后我使用这段代码读取它,然后在计算每小时的平均值后将其导出到 csv 文件。
import pandas as pd
file = pd.read_csv('Electricity_data.csv',
sep = ',',
skiprows = 0,
dayfirst = True,
parse_dates = ['Date/Time'])
pd_mean = file.groupby(pd.Grouper(key = 'Date/Time', freq = 'H')).mean().reset_index()
pd_mean.to_csv("data_1h_year_.csv")
然而,当我运行它时,我的最终文件有一个间隙。
代码启动前数据(日期:2021 年 3 月 1 日):
Date/Time kWh kVArh kVA PF
90 2021-02-01 21:30:00 496.83 0.00 993.6600 1.0
91 2021-02-01 22:00:00 486.72 0.00 973.4400 1.0
92 2021-02-01 22:30:00 490.08 0.00 980.1600 1.0
93 2021-02-01 23:00:00 503.00 1.92 1006.0073 1.0
94 2021-02-01 23:30:00 484.84 0.00 969.6800 1.0
95 2021-03-01 00:00:00 484.80 0.00 969.6000 1.0
96 2021-03-01 00:30:00 487.68 0.00 975.3600 1.0
97 2021-03-01 01:00:00 508.30 1.44 1016.6041 1.0
98 2021-03-01 01:30:00 488.66 0.00 977.3200 1.0
99 2021-03-01 02:00:00 486.24 0.00 972.4800 1.0
100 2021-03-01 02:30:00 495.36 1.44 990.7242 1.0
101 2021-03-01 03:00:00 484.32 0.00 968.6400 1.0
102 2021-03-01 03:30:00 485.76 0.00 971.5200 1.0
103 2021-03-01 04:00:00 492.48 1.44 984.9642 1.0
104 2021-03-01 04:30:00 476.16 0.00 952.3200 1.0
105 2021-03-01 05:00:00 477.12 0.00 954.2400 1.0
数据 代码启动后(日期:03/01/2021):
Date/Time kWh kVArh kVA PF
45 2021-01-02 21:00:00 1658.650 292.32 3368.45000 0.98485
46 2021-01-02 22:00:00 1622.150 291.60 3296.34415 0.98420
47 2021-01-02 23:00:00 1619.300 261.36 3280.52380 0.98720
48 2021-01-03 00:00:00 NaN NaN NaN NaN
49 2021-01-03 01:00:00 NaN NaN NaN NaN
50 2021-01-03 02:00:00 NaN NaN NaN NaN
51 2021-01-03 03:00:00 NaN NaN NaN NaN
52 2021-01-03 04:00:00 NaN NaN NaN NaN
53 2021-01-03 05:00:00 NaN NaN NaN NaN
54 2021-01-03 06:00:00 1202.400 158.40 2425.57730 0.99140
55 2021-01-03 07:00:00 1209.375 168.00 2441.98105 0.99050
56 2021-01-03 08:00:00 1260.950 162.72 2542.89820 0.99175
57 2021-01-03 09:00:00 1308.975 195.60 2647.07935 0.98900
58 2021-01-03 10:00:00 1334.150 193.20 2696.17005 0.98965
我不知道为什么会这样,但它没有计算平均值,我在最终的 csv 文件周围形成了 NaN 间隙。
Pandas 没有正确解释您的日期。自己指定格式。
使用下面的代码解决您的问题:
parser = lambda x: pd.to_datetime(x, format='%m/%d/%Y %H:%M')
df = pd.read_csv('data.csv', sep=',', skiprows=0,
parse_dates=['Date/Time'], date_parser=parser)
pd_mean = df.groupby(pd.Grouper(key='Date/Time', freq='H')).mean()
在手术前检查你的日期:
93 2021-02-01 23:00:00 # February, 1st
94 2021-02-01 23:30:00 # February, 1st
95 2021-03-01 00:00:00 # March, 1st
96 2021-03-01 00:30:00 # March, 1st
我在 csv 文件中有此数据:
Date/Time kWh kVArh kVA PF
0 2021-01-01 00:30:00 471.84 0.00 943.6800 1.0000
1 2021-01-01 01:00:00 491.04 1.44 982.0842 1.0000
2 2021-01-01 01:30:00 475.20 0.00 950.4000 1.0000
3 2021-01-01 02:00:00 470.88 0.00 941.7600 1.0000
4 2021-01-01 02:30:00 466.56 0.00 933.1200 1.0000
... ... ... ... ... ...
9223 2021-07-14 04:00:00 1104.00 53.28 2210.5698 0.9988
9224 2021-07-14 04:30:00 1156.30 49.92 2314.7542 0.9991
9225 2021-07-14 05:00:00 1176.00 37.92 2353.2224 0.9995
9226 2021-07-14 05:30:00 1177.00 27.36 2354.6359 0.9997
9227 2021-07-14 06:00:00 1196.60 22.56 2393.6253 0.9998
然后我使用这段代码读取它,然后在计算每小时的平均值后将其导出到 csv 文件。
import pandas as pd
file = pd.read_csv('Electricity_data.csv',
sep = ',',
skiprows = 0,
dayfirst = True,
parse_dates = ['Date/Time'])
pd_mean = file.groupby(pd.Grouper(key = 'Date/Time', freq = 'H')).mean().reset_index()
pd_mean.to_csv("data_1h_year_.csv")
然而,当我运行它时,我的最终文件有一个间隙。
代码启动前数据(日期:2021 年 3 月 1 日):
Date/Time kWh kVArh kVA PF
90 2021-02-01 21:30:00 496.83 0.00 993.6600 1.0
91 2021-02-01 22:00:00 486.72 0.00 973.4400 1.0
92 2021-02-01 22:30:00 490.08 0.00 980.1600 1.0
93 2021-02-01 23:00:00 503.00 1.92 1006.0073 1.0
94 2021-02-01 23:30:00 484.84 0.00 969.6800 1.0
95 2021-03-01 00:00:00 484.80 0.00 969.6000 1.0
96 2021-03-01 00:30:00 487.68 0.00 975.3600 1.0
97 2021-03-01 01:00:00 508.30 1.44 1016.6041 1.0
98 2021-03-01 01:30:00 488.66 0.00 977.3200 1.0
99 2021-03-01 02:00:00 486.24 0.00 972.4800 1.0
100 2021-03-01 02:30:00 495.36 1.44 990.7242 1.0
101 2021-03-01 03:00:00 484.32 0.00 968.6400 1.0
102 2021-03-01 03:30:00 485.76 0.00 971.5200 1.0
103 2021-03-01 04:00:00 492.48 1.44 984.9642 1.0
104 2021-03-01 04:30:00 476.16 0.00 952.3200 1.0
105 2021-03-01 05:00:00 477.12 0.00 954.2400 1.0
数据 代码启动后(日期:03/01/2021):
Date/Time kWh kVArh kVA PF
45 2021-01-02 21:00:00 1658.650 292.32 3368.45000 0.98485
46 2021-01-02 22:00:00 1622.150 291.60 3296.34415 0.98420
47 2021-01-02 23:00:00 1619.300 261.36 3280.52380 0.98720
48 2021-01-03 00:00:00 NaN NaN NaN NaN
49 2021-01-03 01:00:00 NaN NaN NaN NaN
50 2021-01-03 02:00:00 NaN NaN NaN NaN
51 2021-01-03 03:00:00 NaN NaN NaN NaN
52 2021-01-03 04:00:00 NaN NaN NaN NaN
53 2021-01-03 05:00:00 NaN NaN NaN NaN
54 2021-01-03 06:00:00 1202.400 158.40 2425.57730 0.99140
55 2021-01-03 07:00:00 1209.375 168.00 2441.98105 0.99050
56 2021-01-03 08:00:00 1260.950 162.72 2542.89820 0.99175
57 2021-01-03 09:00:00 1308.975 195.60 2647.07935 0.98900
58 2021-01-03 10:00:00 1334.150 193.20 2696.17005 0.98965
我不知道为什么会这样,但它没有计算平均值,我在最终的 csv 文件周围形成了 NaN 间隙。
Pandas 没有正确解释您的日期。自己指定格式。
使用下面的代码解决您的问题:
parser = lambda x: pd.to_datetime(x, format='%m/%d/%Y %H:%M')
df = pd.read_csv('data.csv', sep=',', skiprows=0,
parse_dates=['Date/Time'], date_parser=parser)
pd_mean = df.groupby(pd.Grouper(key='Date/Time', freq='H')).mean()
在手术前检查你的日期:
93 2021-02-01 23:00:00 # February, 1st
94 2021-02-01 23:30:00 # February, 1st
95 2021-03-01 00:00:00 # March, 1st
96 2021-03-01 00:30:00 # March, 1st