按 6 年的月份对数据进行分组

group data by months of 6 years

我有一个 csv 文件,其中包含从 01/01/2006 到 01/01/2011 的 6 年数据,我需要按 6 年中的每个月对数据进行分组。 这是我的 csv 文件的概述:

 timestamp,heure,lat,lon,impact,type
 2006-01-01 00:00:00,13:58:43,33.837,-9.205,10.3,1
 2006-01-02 00:00:00,00:07:28,34.5293,-10.2384,17.7,1
 2007-02-01 00:00:00,23:01:03,35.0617,-1.435,-17.1,2
 2007-02-02 00:00:00,01:14:29,36.5685,0.9043,36.8,1
 2008-01-01 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1
 2008-01-02 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1
 ....
 2011-12-31 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1

这是所需的输出:

 month 01   10 (counts of the columns)
 month 02   20
 .....
 month 12   30

有什么想法吗??

考虑样本数据框df

np.random.seed([3,1415])

tidx = pd.date_range('2006-01-01', '2011-01-01', name='Date')

df = pd.DataFrame(dict(
        heure=pd.to_timedelta(np.random.randint(24*60*60, size=len(tidx))),
        lat=np.random.rand(len(tidx)) * 10 + 30,
        lon=np.random.rand(len(tidx)) * 10 - 20,
        impact=np.random.rand(len(tidx)),
        type=np.random.randint(3, size=len(tidx))
    ), tidx)

df.head()

                     heure    impact        lat        lon  type
Date                                                            
2006-01-01 00:00:00.000037  0.312643  39.324254 -14.715073     1
2006-01-02 00:00:00.000019  0.121450  30.560726 -10.879014     0
2006-01-03 00:00:00.000060  0.080082  38.489212 -11.899611     1
2006-01-04 00:00:00.000021  0.270159  34.832683 -14.924849     0
2006-01-05 00:00:00.000066  0.112194  32.193704 -19.083123     0

使用df.index.month分组依据

df.groupby(df.index.month).size()

Date
1     156
2     141
3     155
4     150
5     155
6     150
7     155
8     155
9     150
10    155
11    150
12    155
dtype: int64

你可以做你通常用一个组做的所有事情......这是一个使用 describe

的例子
df.groupby(df.index.strftime('%B')).impact.describe()

      count      mean       std       min       25%       50%       75%       max
Date                                                                             
1     156.0  0.529216  0.279498  0.003298  0.292654  0.538437  0.774256  0.998507
2     141.0  0.501540  0.295111  0.001063  0.243723  0.491919  0.727560  0.999231
3     155.0  0.516168  0.306878  0.001178  0.227668  0.556316  0.783676  0.997126
4     150.0  0.472035  0.263685  0.004031  0.246738  0.491169  0.665894  0.987965
5     155.0  0.523897  0.320709  0.003486  0.221323  0.538594  0.841909  0.994280
6     150.0  0.542496  0.297215  0.003550  0.273098  0.589802  0.807086  0.995538
7     155.0  0.513857  0.285404  0.000933  0.285383  0.519170  0.746735  0.999551
8     155.0  0.516404  0.284407  0.004662  0.288900  0.545429  0.739392  0.996601
9     150.0  0.490965  0.299312  0.011958  0.206851  0.487708  0.737785  0.993217
10    155.0  0.513743  0.304779  0.010712  0.199390  0.563746  0.796143  0.995488
11    150.0  0.465428  0.271936  0.006345  0.221753  0.470793  0.684867  0.995886
12    155.0  0.498415  0.301704  0.004538  0.215730  0.471139  0.757360  0.997268

它会起作用:

df["month"]=df["timestamp"].dt.month
df.groupby(["month"].size()

我主要使用重采样来做到这一点。

这是我的示例:

import numpy as np
import pandas as pd
index = pd.date_range('2017/1/1', '2017/10/1')
df = pd.DataFrame(np.ones((274, 1)), index)
df
          0
2017-01-01  1.0
2017-01-02  1.0
...         ...
2017-09-29  1.0
2017-09-30  1.0
2017-10-01  1.0

df.resample('M').count()  # use resample to agg data
2017-01-31  31
2017-02-28  28
2017-03-31  31
2017-04-30  30
2017-05-31  31
2017-06-30  30
2017-07-31  31
2017-08-31  31
2017-09-30  30
2017-10-31   1