Pandas-对于具有大量数据(月)的数据帧,沿 dataframe.index(小时、分钟)的平均值 (Python)

Pandas-mean along dataframe.index (hour, minute) for a dataframe with a large amount of data (months) (Python)

我想分析温度的季节性变化。下面的数据框显示了 3 天的数据(我有一年的数据),所以我试图获得小时索引的平均值,即获得 [2013-09-09 11:00:00+00:002013-09-10 11:00:00+00:002013-09-11 11:00:00+00:00],然后是 [2013-09-09 12:00:00+00:002013-09-10 12:00:00+00:002013-09-11 12:00:00+00:00],依此类推,每个小时索引 (11h:21h)。结果日(或月或年)并不重要,例如 3 个月数据平均值(一个季节)的结果,我可以绘制一个季节的每日变化。

                                tbr45       tbl45       tbr90       tbl90   
2013-09-09 11:00:00+00:00  505.323547  486.355286  595.912720  583.071960   
2013-09-09 12:00:00+00:00  508.812225  491.494843  627.731262  616.314941   
2013-09-09 13:00:00+00:00  540.965820  535.360657  637.981689  627.243225   
2013-09-09 14:00:00+00:00  583.952026  584.262878  637.378601  625.770691   
2013-09-09 15:00:00+00:00  652.799438  630.148438  635.095337  619.337158   
2013-09-09 16:00:00+00:00  687.486511  661.011414  614.061523  597.570068   
2013-09-09 17:00:00+00:00  690.432129  671.480835  614.834229  598.777344   
2013-09-09 18:00:00+00:00  664.571106  670.975525  596.734070  583.309143   
2013-09-09 19:00:00+00:00  636.534912  646.669556  593.567078  581.381348   
2013-09-09 20:00:00+00:00  604.899963  597.256653  611.781738  598.315979   
2013-09-09 21:00:00+00:00  526.182312  517.186646  592.611755  580.399963   
2013-09-10 11:00:00+00:00  507.257507  487.112335  596.071655  582.828552   
2013-09-10 12:00:00+00:00  514.157104  496.605927  629.022949  617.111145   
2013-09-10 13:00:00+00:00  543.978333  538.196045  639.798706  629.375244   
2013-09-10 14:00:00+00:00  587.493408  585.671875  641.292175  629.457397   
2013-09-10 15:00:00+00:00  651.846252  627.484863  635.671021  620.121277   
2013-09-10 16:00:00+00:00  686.149902  658.505676  613.066895  596.212952   
2013-09-10 17:00:00+00:00  691.174622  670.735596  611.395569  595.197021   
2013-09-10 18:00:00+00:00  667.054932  670.445007  597.292175  583.138916   
2013-09-10 19:00:00+00:00  639.944458  648.966736  592.374939  579.698120   
2013-09-10 20:00:00+00:00  603.718811  595.387939  611.565613  597.743958   
2013-09-10 21:00:00+00:00  527.942688  518.770142  592.729553  580.896545   
2013-09-11 11:00:00+00:00  513.669922  495.586151  591.327881  580.392639   
2013-09-11 12:00:00+00:00  515.318848  498.809570  623.302124  613.500732   
2013-09-11 13:00:00+00:00  535.538452  531.539246  627.050720  618.705933   
2013-09-11 14:00:00+00:00  582.224304  580.309814  630.984863  621.197205   
2013-09-11 15:00:00+00:00  651.124817  627.319275  624.498230  611.242371   
2013-09-11 16:00:00+00:00  687.329346  661.133118  601.152771  586.607910   
2013-09-11 17:00:00+00:00  692.142639  672.756165  602.042847  587.650757   
2013-09-11 18:00:00+00:00  665.618652  671.501221  587.212708  575.742493   
2013-09-11 19:00:00+00:00  637.342834  650.392517  581.707153  571.003113   
2013-09-11 20:00:00+00:00  605.727783  600.323730  605.188293  593.453613   
2013-09-11 21:00:00+00:00  532.850342  524.873962  585.960083  575.799561

我尝试重塑并在相应的轴上应用平均值,它有效,但对于这个数据量是艰巨的,我认为使用 pandas 可能会有所帮助。

如果您的索引由时间戳组成,则以下方法应该有效。首先,创建一个从日期中提取小时的列。然后只需在小时执行 groupby,取平均值。

df['hour'] = [ts.hour for ts in df.index]

>>> df.groupby('hour').mean()
           tbr45       tbl45       tbr90       tbl90
hour                                                
11    508.750325  489.684591  594.437419  582.097717
12    512.762726  495.636780  626.685445  615.642273
13    540.160868  535.031983  634.943705  625.108134
14    584.556579  583.414856  636.551880  625.475098
15    651.923502  628.317525  631.754863  616.900269
16    686.988586  660.216736  609.427063  593.463643
17    691.249797  671.657532  609.424215  593.875041
18    665.748230  670.973918  593.746318  580.730184
19    637.940735  648.676270  589.216390  577.360860
20    604.782186  597.656107  609.511881  596.504517
21    528.991781  520.276917  590.433797  579.032023

如果您的索引是字符串,您必须先将它们转换为时间戳。

df.index = pd.to_datetime(df.index)