根据确切日期按季节分组数据
group data by season according to the exact dates
我有一个包含 4 年数据的 csv 文件,我正在尝试对 4 年每个季节的数据进行分组,换句话说,我需要将我的整个数据汇总并绘制成仅 4 个季节。
下面是我的数据文件:
timestamp,heure,lat,lon,impact,type
2006-01-01 00:00:00,13:58:43,33.837,-9.205,10.3,1
2006-01-02 00:00:00,00:07:28,34.5293,-10.2384,17.7,1
2007-02-01 00:00:00,23:01:03,35.0617,-1.435,-17.1,2
2007-02-02 00:00:00,01:14:29,36.5685,0.9043,36.8,1
2008-01-01 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1
2008-01-02 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1
....
2011-12-31 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1
这是我想要的输出:
winter (the mean value of impacts)
summer (the mean value of impacts)
autumn ....
spring .....
实际上我试过这段代码:
names =["timestamp","heure","lat","lon","impact","type"]
data = pd.read_csv('flash.txt',names=names, parse_dates=['timestamp'],index_col=['timestamp'], dayfirst=True)
spring = range(80, 172)
summer = range(172, 264)
fall = range(264, 355)
def season(x):
if x in spring:
return 'Spring'
if x in summer:
return 'Summer'
if x in fall:
return 'Fall'
else :
return 'Winter'
data['SEASON'] = data.index.to_series().dt.month.map(lambda x : season(x))
data['impact'] = data['impact'].abs()
seasonly = data.groupby('SEASON')['impact'].mean()
我得到了这个可怕的结果:
我错在哪里了?
看起来像:
data['SEASON'] = data.index.to_series().dt.**month**.map(lambda x : season(x))
使用的月份大概是 1-12 或 0-11,它们都是 "winter"。
您需要使用一年中的某一天。
但如果您没有将当天的提取物锁定在单行内,您可能会更容易地看到这一点,并且可以打印出来自己检查。随便说说。
data['SEASON'] = data.index.dayofyear.map(season)
pandas.cut
的另一个解决方案:
bins = [0, 91, 183, 275, 366]
labels=['Winter', 'Spring', 'Summer', 'Fall']
doy = data.index.dayofyear
data['SEASON1'] = pd.cut(doy + 11 - 366*(doy > 355), bins=bins, labels=labels)
pandas.cut
为了正确处理 'Winter'
在年初和年底的情况,我将 dayofyear
移动 11
并将结果取模 366
.我不使用与下面 numpy
解决方案相同的技术的原因是 pd.cut
returns 是一个分类类型,我最终会得到 5 个类别,其中两个类别具有相同的标签。然后我可以将结果转换为字符串,但感觉很草率。
data['SEASON'] = pd.cut(
(data.index.dayofyear + 11) % 366,
[0, 91, 183, 275, 366],
labels=['Winter', 'Spring', 'Summer', 'Fall']
)
numpy.searchsorted
为了正确处理 'Winter'
在年初和年底,我允许 'Winter'
有两个垃圾箱
seasons = np.array(['Winter', 'Spring', 'Summer', 'Fall', 'Winter'])
f = np.searchsorted([80, 172, 264, 355], data.index.dayofyear)
data['SEASON'] = seasons[f]
plot
data.groupby('SEASON')['impact'].mean().plot.bar()
我有一个包含 4 年数据的 csv 文件,我正在尝试对 4 年每个季节的数据进行分组,换句话说,我需要将我的整个数据汇总并绘制成仅 4 个季节。 下面是我的数据文件:
timestamp,heure,lat,lon,impact,type
2006-01-01 00:00:00,13:58:43,33.837,-9.205,10.3,1
2006-01-02 00:00:00,00:07:28,34.5293,-10.2384,17.7,1
2007-02-01 00:00:00,23:01:03,35.0617,-1.435,-17.1,2
2007-02-02 00:00:00,01:14:29,36.5685,0.9043,36.8,1
2008-01-01 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1
2008-01-02 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1
....
2011-12-31 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1
这是我想要的输出:
winter (the mean value of impacts)
summer (the mean value of impacts)
autumn ....
spring .....
实际上我试过这段代码:
names =["timestamp","heure","lat","lon","impact","type"]
data = pd.read_csv('flash.txt',names=names, parse_dates=['timestamp'],index_col=['timestamp'], dayfirst=True)
spring = range(80, 172)
summer = range(172, 264)
fall = range(264, 355)
def season(x):
if x in spring:
return 'Spring'
if x in summer:
return 'Summer'
if x in fall:
return 'Fall'
else :
return 'Winter'
data['SEASON'] = data.index.to_series().dt.month.map(lambda x : season(x))
data['impact'] = data['impact'].abs()
seasonly = data.groupby('SEASON')['impact'].mean()
我得到了这个可怕的结果:
我错在哪里了?
看起来像:
data['SEASON'] = data.index.to_series().dt.**month**.map(lambda x : season(x))
使用的月份大概是 1-12 或 0-11,它们都是 "winter"。 您需要使用一年中的某一天。
但如果您没有将当天的提取物锁定在单行内,您可能会更容易地看到这一点,并且可以打印出来自己检查。随便说说。
data['SEASON'] = data.index.dayofyear.map(season)
pandas.cut
的另一个解决方案:
bins = [0, 91, 183, 275, 366]
labels=['Winter', 'Spring', 'Summer', 'Fall']
doy = data.index.dayofyear
data['SEASON1'] = pd.cut(doy + 11 - 366*(doy > 355), bins=bins, labels=labels)
pandas.cut
为了正确处理 'Winter'
在年初和年底的情况,我将 dayofyear
移动 11
并将结果取模 366
.我不使用与下面 numpy
解决方案相同的技术的原因是 pd.cut
returns 是一个分类类型,我最终会得到 5 个类别,其中两个类别具有相同的标签。然后我可以将结果转换为字符串,但感觉很草率。
data['SEASON'] = pd.cut(
(data.index.dayofyear + 11) % 366,
[0, 91, 183, 275, 366],
labels=['Winter', 'Spring', 'Summer', 'Fall']
)
numpy.searchsorted
为了正确处理 'Winter'
在年初和年底,我允许 'Winter'
seasons = np.array(['Winter', 'Spring', 'Summer', 'Fall', 'Winter'])
f = np.searchsorted([80, 172, 264, 355], data.index.dayofyear)
data['SEASON'] = seasons[f]
plot
data.groupby('SEASON')['impact'].mean().plot.bar()