Pandas-对于具有大量数据(月)的数据帧,沿 dataframe.index(小时、分钟)的平均值 (Python)
Pandas-mean along dataframe.index (hour, minute) for a dataframe with a large amount of data (months) (Python)
我想分析温度的季节性变化。下面的数据框显示了 3 天的数据(我有一年的数据),所以我试图获得小时索引的平均值,即获得 [2013-09-09 11:00:00+00:00
、2013-09-10 11:00:00+00:00
和2013-09-11 11:00:00+00:00
],然后是 [2013-09-09 12:00:00+00:00
、2013-09-10 12:00:00+00:00
和 2013-09-11 12:00:00+00:00
],依此类推,每个小时索引 (11h:21h)。结果日(或月或年)并不重要,例如 3 个月数据平均值(一个季节)的结果,我可以绘制一个季节的每日变化。
tbr45 tbl45 tbr90 tbl90
2013-09-09 11:00:00+00:00 505.323547 486.355286 595.912720 583.071960
2013-09-09 12:00:00+00:00 508.812225 491.494843 627.731262 616.314941
2013-09-09 13:00:00+00:00 540.965820 535.360657 637.981689 627.243225
2013-09-09 14:00:00+00:00 583.952026 584.262878 637.378601 625.770691
2013-09-09 15:00:00+00:00 652.799438 630.148438 635.095337 619.337158
2013-09-09 16:00:00+00:00 687.486511 661.011414 614.061523 597.570068
2013-09-09 17:00:00+00:00 690.432129 671.480835 614.834229 598.777344
2013-09-09 18:00:00+00:00 664.571106 670.975525 596.734070 583.309143
2013-09-09 19:00:00+00:00 636.534912 646.669556 593.567078 581.381348
2013-09-09 20:00:00+00:00 604.899963 597.256653 611.781738 598.315979
2013-09-09 21:00:00+00:00 526.182312 517.186646 592.611755 580.399963
2013-09-10 11:00:00+00:00 507.257507 487.112335 596.071655 582.828552
2013-09-10 12:00:00+00:00 514.157104 496.605927 629.022949 617.111145
2013-09-10 13:00:00+00:00 543.978333 538.196045 639.798706 629.375244
2013-09-10 14:00:00+00:00 587.493408 585.671875 641.292175 629.457397
2013-09-10 15:00:00+00:00 651.846252 627.484863 635.671021 620.121277
2013-09-10 16:00:00+00:00 686.149902 658.505676 613.066895 596.212952
2013-09-10 17:00:00+00:00 691.174622 670.735596 611.395569 595.197021
2013-09-10 18:00:00+00:00 667.054932 670.445007 597.292175 583.138916
2013-09-10 19:00:00+00:00 639.944458 648.966736 592.374939 579.698120
2013-09-10 20:00:00+00:00 603.718811 595.387939 611.565613 597.743958
2013-09-10 21:00:00+00:00 527.942688 518.770142 592.729553 580.896545
2013-09-11 11:00:00+00:00 513.669922 495.586151 591.327881 580.392639
2013-09-11 12:00:00+00:00 515.318848 498.809570 623.302124 613.500732
2013-09-11 13:00:00+00:00 535.538452 531.539246 627.050720 618.705933
2013-09-11 14:00:00+00:00 582.224304 580.309814 630.984863 621.197205
2013-09-11 15:00:00+00:00 651.124817 627.319275 624.498230 611.242371
2013-09-11 16:00:00+00:00 687.329346 661.133118 601.152771 586.607910
2013-09-11 17:00:00+00:00 692.142639 672.756165 602.042847 587.650757
2013-09-11 18:00:00+00:00 665.618652 671.501221 587.212708 575.742493
2013-09-11 19:00:00+00:00 637.342834 650.392517 581.707153 571.003113
2013-09-11 20:00:00+00:00 605.727783 600.323730 605.188293 593.453613
2013-09-11 21:00:00+00:00 532.850342 524.873962 585.960083 575.799561
我尝试重塑并在相应的轴上应用平均值,它有效,但对于这个数据量是艰巨的,我认为使用 pandas 可能会有所帮助。
如果您的索引由时间戳组成,则以下方法应该有效。首先,创建一个从日期中提取小时的列。然后只需在小时执行 groupby
,取平均值。
df['hour'] = [ts.hour for ts in df.index]
>>> df.groupby('hour').mean()
tbr45 tbl45 tbr90 tbl90
hour
11 508.750325 489.684591 594.437419 582.097717
12 512.762726 495.636780 626.685445 615.642273
13 540.160868 535.031983 634.943705 625.108134
14 584.556579 583.414856 636.551880 625.475098
15 651.923502 628.317525 631.754863 616.900269
16 686.988586 660.216736 609.427063 593.463643
17 691.249797 671.657532 609.424215 593.875041
18 665.748230 670.973918 593.746318 580.730184
19 637.940735 648.676270 589.216390 577.360860
20 604.782186 597.656107 609.511881 596.504517
21 528.991781 520.276917 590.433797 579.032023
如果您的索引是字符串,您必须先将它们转换为时间戳。
df.index = pd.to_datetime(df.index)
我想分析温度的季节性变化。下面的数据框显示了 3 天的数据(我有一年的数据),所以我试图获得小时索引的平均值,即获得 [2013-09-09 11:00:00+00:00
、2013-09-10 11:00:00+00:00
和2013-09-11 11:00:00+00:00
],然后是 [2013-09-09 12:00:00+00:00
、2013-09-10 12:00:00+00:00
和 2013-09-11 12:00:00+00:00
],依此类推,每个小时索引 (11h:21h)。结果日(或月或年)并不重要,例如 3 个月数据平均值(一个季节)的结果,我可以绘制一个季节的每日变化。
tbr45 tbl45 tbr90 tbl90
2013-09-09 11:00:00+00:00 505.323547 486.355286 595.912720 583.071960
2013-09-09 12:00:00+00:00 508.812225 491.494843 627.731262 616.314941
2013-09-09 13:00:00+00:00 540.965820 535.360657 637.981689 627.243225
2013-09-09 14:00:00+00:00 583.952026 584.262878 637.378601 625.770691
2013-09-09 15:00:00+00:00 652.799438 630.148438 635.095337 619.337158
2013-09-09 16:00:00+00:00 687.486511 661.011414 614.061523 597.570068
2013-09-09 17:00:00+00:00 690.432129 671.480835 614.834229 598.777344
2013-09-09 18:00:00+00:00 664.571106 670.975525 596.734070 583.309143
2013-09-09 19:00:00+00:00 636.534912 646.669556 593.567078 581.381348
2013-09-09 20:00:00+00:00 604.899963 597.256653 611.781738 598.315979
2013-09-09 21:00:00+00:00 526.182312 517.186646 592.611755 580.399963
2013-09-10 11:00:00+00:00 507.257507 487.112335 596.071655 582.828552
2013-09-10 12:00:00+00:00 514.157104 496.605927 629.022949 617.111145
2013-09-10 13:00:00+00:00 543.978333 538.196045 639.798706 629.375244
2013-09-10 14:00:00+00:00 587.493408 585.671875 641.292175 629.457397
2013-09-10 15:00:00+00:00 651.846252 627.484863 635.671021 620.121277
2013-09-10 16:00:00+00:00 686.149902 658.505676 613.066895 596.212952
2013-09-10 17:00:00+00:00 691.174622 670.735596 611.395569 595.197021
2013-09-10 18:00:00+00:00 667.054932 670.445007 597.292175 583.138916
2013-09-10 19:00:00+00:00 639.944458 648.966736 592.374939 579.698120
2013-09-10 20:00:00+00:00 603.718811 595.387939 611.565613 597.743958
2013-09-10 21:00:00+00:00 527.942688 518.770142 592.729553 580.896545
2013-09-11 11:00:00+00:00 513.669922 495.586151 591.327881 580.392639
2013-09-11 12:00:00+00:00 515.318848 498.809570 623.302124 613.500732
2013-09-11 13:00:00+00:00 535.538452 531.539246 627.050720 618.705933
2013-09-11 14:00:00+00:00 582.224304 580.309814 630.984863 621.197205
2013-09-11 15:00:00+00:00 651.124817 627.319275 624.498230 611.242371
2013-09-11 16:00:00+00:00 687.329346 661.133118 601.152771 586.607910
2013-09-11 17:00:00+00:00 692.142639 672.756165 602.042847 587.650757
2013-09-11 18:00:00+00:00 665.618652 671.501221 587.212708 575.742493
2013-09-11 19:00:00+00:00 637.342834 650.392517 581.707153 571.003113
2013-09-11 20:00:00+00:00 605.727783 600.323730 605.188293 593.453613
2013-09-11 21:00:00+00:00 532.850342 524.873962 585.960083 575.799561
我尝试重塑并在相应的轴上应用平均值,它有效,但对于这个数据量是艰巨的,我认为使用 pandas 可能会有所帮助。
如果您的索引由时间戳组成,则以下方法应该有效。首先,创建一个从日期中提取小时的列。然后只需在小时执行 groupby
,取平均值。
df['hour'] = [ts.hour for ts in df.index]
>>> df.groupby('hour').mean()
tbr45 tbl45 tbr90 tbl90
hour
11 508.750325 489.684591 594.437419 582.097717
12 512.762726 495.636780 626.685445 615.642273
13 540.160868 535.031983 634.943705 625.108134
14 584.556579 583.414856 636.551880 625.475098
15 651.923502 628.317525 631.754863 616.900269
16 686.988586 660.216736 609.427063 593.463643
17 691.249797 671.657532 609.424215 593.875041
18 665.748230 670.973918 593.746318 580.730184
19 637.940735 648.676270 589.216390 577.360860
20 604.782186 597.656107 609.511881 596.504517
21 528.991781 520.276917 590.433797 579.032023
如果您的索引是字符串,您必须先将它们转换为时间戳。
df.index = pd.to_datetime(df.index)