按 6 年的月份对数据进行分组
group data by months of 6 years
我有一个 csv 文件,其中包含从 01/01/2006 到 01/01/2011 的 6 年数据,我需要按 6 年中的每个月对数据进行分组。
这是我的 csv 文件的概述:
timestamp,heure,lat,lon,impact,type
2006-01-01 00:00:00,13:58:43,33.837,-9.205,10.3,1
2006-01-02 00:00:00,00:07:28,34.5293,-10.2384,17.7,1
2007-02-01 00:00:00,23:01:03,35.0617,-1.435,-17.1,2
2007-02-02 00:00:00,01:14:29,36.5685,0.9043,36.8,1
2008-01-01 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1
2008-01-02 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1
....
2011-12-31 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1
这是所需的输出:
month 01 10 (counts of the columns)
month 02 20
.....
month 12 30
有什么想法吗??
考虑样本数据框df
np.random.seed([3,1415])
tidx = pd.date_range('2006-01-01', '2011-01-01', name='Date')
df = pd.DataFrame(dict(
heure=pd.to_timedelta(np.random.randint(24*60*60, size=len(tidx))),
lat=np.random.rand(len(tidx)) * 10 + 30,
lon=np.random.rand(len(tidx)) * 10 - 20,
impact=np.random.rand(len(tidx)),
type=np.random.randint(3, size=len(tidx))
), tidx)
df.head()
heure impact lat lon type
Date
2006-01-01 00:00:00.000037 0.312643 39.324254 -14.715073 1
2006-01-02 00:00:00.000019 0.121450 30.560726 -10.879014 0
2006-01-03 00:00:00.000060 0.080082 38.489212 -11.899611 1
2006-01-04 00:00:00.000021 0.270159 34.832683 -14.924849 0
2006-01-05 00:00:00.000066 0.112194 32.193704 -19.083123 0
使用df.index.month
分组依据
df.groupby(df.index.month).size()
Date
1 156
2 141
3 155
4 150
5 155
6 150
7 155
8 155
9 150
10 155
11 150
12 155
dtype: int64
你可以做你通常用一个组做的所有事情......这是一个使用 describe
的例子
df.groupby(df.index.strftime('%B')).impact.describe()
count mean std min 25% 50% 75% max
Date
1 156.0 0.529216 0.279498 0.003298 0.292654 0.538437 0.774256 0.998507
2 141.0 0.501540 0.295111 0.001063 0.243723 0.491919 0.727560 0.999231
3 155.0 0.516168 0.306878 0.001178 0.227668 0.556316 0.783676 0.997126
4 150.0 0.472035 0.263685 0.004031 0.246738 0.491169 0.665894 0.987965
5 155.0 0.523897 0.320709 0.003486 0.221323 0.538594 0.841909 0.994280
6 150.0 0.542496 0.297215 0.003550 0.273098 0.589802 0.807086 0.995538
7 155.0 0.513857 0.285404 0.000933 0.285383 0.519170 0.746735 0.999551
8 155.0 0.516404 0.284407 0.004662 0.288900 0.545429 0.739392 0.996601
9 150.0 0.490965 0.299312 0.011958 0.206851 0.487708 0.737785 0.993217
10 155.0 0.513743 0.304779 0.010712 0.199390 0.563746 0.796143 0.995488
11 150.0 0.465428 0.271936 0.006345 0.221753 0.470793 0.684867 0.995886
12 155.0 0.498415 0.301704 0.004538 0.215730 0.471139 0.757360 0.997268
它会起作用:
df["month"]=df["timestamp"].dt.month
df.groupby(["month"].size()
我主要使用重采样来做到这一点。
这是我的示例:
import numpy as np
import pandas as pd
index = pd.date_range('2017/1/1', '2017/10/1')
df = pd.DataFrame(np.ones((274, 1)), index)
df
0
2017-01-01 1.0
2017-01-02 1.0
... ...
2017-09-29 1.0
2017-09-30 1.0
2017-10-01 1.0
df.resample('M').count() # use resample to agg data
2017-01-31 31
2017-02-28 28
2017-03-31 31
2017-04-30 30
2017-05-31 31
2017-06-30 30
2017-07-31 31
2017-08-31 31
2017-09-30 30
2017-10-31 1
我有一个 csv 文件,其中包含从 01/01/2006 到 01/01/2011 的 6 年数据,我需要按 6 年中的每个月对数据进行分组。 这是我的 csv 文件的概述:
timestamp,heure,lat,lon,impact,type
2006-01-01 00:00:00,13:58:43,33.837,-9.205,10.3,1
2006-01-02 00:00:00,00:07:28,34.5293,-10.2384,17.7,1
2007-02-01 00:00:00,23:01:03,35.0617,-1.435,-17.1,2
2007-02-02 00:00:00,01:14:29,36.5685,0.9043,36.8,1
2008-01-01 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1
2008-01-02 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1
....
2011-12-31 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1
这是所需的输出:
month 01 10 (counts of the columns)
month 02 20
.....
month 12 30
有什么想法吗??
考虑样本数据框df
np.random.seed([3,1415])
tidx = pd.date_range('2006-01-01', '2011-01-01', name='Date')
df = pd.DataFrame(dict(
heure=pd.to_timedelta(np.random.randint(24*60*60, size=len(tidx))),
lat=np.random.rand(len(tidx)) * 10 + 30,
lon=np.random.rand(len(tidx)) * 10 - 20,
impact=np.random.rand(len(tidx)),
type=np.random.randint(3, size=len(tidx))
), tidx)
df.head()
heure impact lat lon type
Date
2006-01-01 00:00:00.000037 0.312643 39.324254 -14.715073 1
2006-01-02 00:00:00.000019 0.121450 30.560726 -10.879014 0
2006-01-03 00:00:00.000060 0.080082 38.489212 -11.899611 1
2006-01-04 00:00:00.000021 0.270159 34.832683 -14.924849 0
2006-01-05 00:00:00.000066 0.112194 32.193704 -19.083123 0
使用df.index.month
分组依据
df.groupby(df.index.month).size()
Date
1 156
2 141
3 155
4 150
5 155
6 150
7 155
8 155
9 150
10 155
11 150
12 155
dtype: int64
你可以做你通常用一个组做的所有事情......这是一个使用 describe
的例子df.groupby(df.index.strftime('%B')).impact.describe()
count mean std min 25% 50% 75% max
Date
1 156.0 0.529216 0.279498 0.003298 0.292654 0.538437 0.774256 0.998507
2 141.0 0.501540 0.295111 0.001063 0.243723 0.491919 0.727560 0.999231
3 155.0 0.516168 0.306878 0.001178 0.227668 0.556316 0.783676 0.997126
4 150.0 0.472035 0.263685 0.004031 0.246738 0.491169 0.665894 0.987965
5 155.0 0.523897 0.320709 0.003486 0.221323 0.538594 0.841909 0.994280
6 150.0 0.542496 0.297215 0.003550 0.273098 0.589802 0.807086 0.995538
7 155.0 0.513857 0.285404 0.000933 0.285383 0.519170 0.746735 0.999551
8 155.0 0.516404 0.284407 0.004662 0.288900 0.545429 0.739392 0.996601
9 150.0 0.490965 0.299312 0.011958 0.206851 0.487708 0.737785 0.993217
10 155.0 0.513743 0.304779 0.010712 0.199390 0.563746 0.796143 0.995488
11 150.0 0.465428 0.271936 0.006345 0.221753 0.470793 0.684867 0.995886
12 155.0 0.498415 0.301704 0.004538 0.215730 0.471139 0.757360 0.997268
它会起作用:
df["month"]=df["timestamp"].dt.month
df.groupby(["month"].size()
我主要使用重采样来做到这一点。
这是我的示例:
import numpy as np
import pandas as pd
index = pd.date_range('2017/1/1', '2017/10/1')
df = pd.DataFrame(np.ones((274, 1)), index)
df
0
2017-01-01 1.0
2017-01-02 1.0
... ...
2017-09-29 1.0
2017-09-30 1.0
2017-10-01 1.0
df.resample('M').count() # use resample to agg data
2017-01-31 31
2017-02-28 28
2017-03-31 31
2017-04-30 30
2017-05-31 31
2017-06-30 30
2017-07-31 31
2017-08-31 31
2017-09-30 30
2017-10-31 1