Python: 以 CSV 格式计算每小时的平均值?
Python: Calculate average for each hour in CSV?
我想使用 CSV 文件计算每小时的平均值:
下面是我的数据集:
Timestamp Temperature
9/1/2016 0:00:08 53.8
9/1/2016 0:00:38 53.8
9/1/2016 0:01:08 53.8
9/1/2016 0:01:38 53.8
9/1/2016 0:02:08 53.8
9/1/2016 0:02:38 54.1
9/1/2016 0:03:08 54.1
9/1/2016 0:03:38 54.1
9/1/2016 0:04:38 54
9/1/2016 0:05:38 54
9/1/2016 0:06:08 54
9/1/2016 0:06:38 54
9/1/2016 0:07:08 54
9/1/2016 0:07:38 54
9/1/2016 0:08:08 54.1
9/1/2016 0:08:38 54.1
9/1/2016 0:09:38 54.1
9/1/2016 0:10:32 54
9/1/2016 0:11:02 54
9/1/2016 0:11:32 54
9/1/2016 0:00:08 54
9/2/2016 0:00:20 32
9/2/2016 0:00:50 32
9/2/2016 0:01:20 32
9/2/2016 0:01:50 32
9/2/2016 0:02:20 32
9/2/2016 0:02:50 32
9/2/2016 0:03:20 32
9/2/2016 0:03:50 32
9/2/2016 0:04:20 32
9/2/2016 0:04:50 32
9/2/2016 0:05:20 32
9/2/2016 0:05:50 32
9/2/2016 0:06:20 32
9/2/2016 0:06:50 32
9/2/2016 0:07:20 32
9/2/2016 0:07:50 32
这是我计算每天平均值的代码,但我想要每小时:
from datetime import datetime
import pandas
def same_day(date_string): # Remove year
return datetime.strptime(date_string, "%m/%d/%Y %H:%M%S").strftime(%m%d')
df = pandas.read_csv('/home/kk/Desktop/cal_Avg.csv',index_col=0,usecols=[0, 1], names=['Timestamp', 'Discharge'],converters={'Timestamp': same_day})
print(df.groupby(level=0).mean())
我想要的输出是这样的:
Timestamp Temp * Avg
9/1/2016 0:00:08 53.8
9/1/2016 0:00:38 53.8 ?avg for this hour
9/1/2016 0:01:08 53.8
9/1/2016 0:01:38 53.8 ?avg for this hour
9/1/2016 0:02:08 53.8
9/1/2016 0:02:38 54.1
现在我想要特定时间的平均值,最小值
期望的输出:
这里我只打印日期 01-09-2016 和 02-09-16 的 5 小时输出
010900 54.362727 45.497273
010901 54.723276 45.068103
010902 54.746847 45.370270
010903 54.833913 44.931304
010904 54.971053 44.835088
010905 55.519444 44.459259
020901 31.742553 55.640426
020902 31.495556 55.655556
020903 31.304348 55.442609
020904 31.200000 55.437273
020905 31.294382 55.442697
具体日期还有具体时间?
我该如何存档?
我想你首先需要 read_csv
和参数 index_col=[0]
来读取第一列到 index
和 parse_dates=[0]
来解析第一列到 DatetimeIndex
:
df = pd.read_csv('filename', index_col=[0], parse_dates=[0],, usecols=[0,1])
print (df)
Temperature
Timestamp
2016-09-01 00:00:08 53.8
2016-09-01 00:00:38 53.8
2016-09-01 00:01:08 53.8
2016-09-01 00:01:38 53.8
2016-09-01 00:02:08 53.8
2016-09-01 00:02:38 54.1
2016-09-01 00:03:08 54.1
...
...
然后使用resample
by hours
and aggregate Resampler.mean
,但是在DatetimeIndex
中缺少数据得到NaN
:
print (df.resample('H').mean())
Temperature
Timestamp
2016-09-01 00:00:00 53.980952
2016-09-01 01:00:00 NaN
2016-09-01 02:00:00 NaN
2016-09-01 03:00:00 NaN
2016-09-01 04:00:00 NaN
2016-09-01 05:00:00 NaN
2016-09-01 06:00:00 NaN
2016-09-01 07:00:00 NaN
2016-09-01 08:00:00 NaN
2016-09-01 09:00:00 NaN
2016-09-01 10:00:00 NaN
2016-09-01 11:00:00 NaN
2016-09-01 12:00:00 NaN
2016-09-01 13:00:00 NaN
2016-09-01 14:00:00 NaN
2016-09-01 15:00:00 NaN
2016-09-01 16:00:00 NaN
2016-09-01 17:00:00 NaN
2016-09-01 18:00:00 NaN
2016-09-01 19:00:00 NaN
2016-09-01 20:00:00 NaN
2016-09-01 21:00:00 NaN
2016-09-01 22:00:00 NaN
2016-09-01 23:00:00 NaN
2016-09-02 00:00:00 32.000000
另一种解决方案是删除 minutes
和 seconds
通过转换为 hours
和 groupby
array
:
print (df.index.values.astype('<M8[h]'))
['2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
'2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
'2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
'2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
'2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
'2016-09-01T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
'2016-09-02T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
'2016-09-02T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
'2016-09-02T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
'2016-09-02T00']
print (df.groupby([df.index.values.astype('<M8[h]')]).mean())
Temperature
2016-09-01 53.980952
2016-09-02 32.000000
此外,如果需要按月、日和小时进行平均,则可以 groupby
按 DatetimeIndex.strftime
:
print (df.index.strftime('%m%d%H'))
['090100' '090100' '090100' '090100' '090100' '090100' '090100' '090100'
'090100' '090100' '090100' '090100' '090100' '090100' '090100' '090100'
'090100' '090100' '090100' '090100' '090100' '090200' '090200' '090200'
'090200' '090200' '090200' '090200' '090200' '090200' '090200' '090200'
'090200' '090200' '090200' '090200' '090200']
print (df.groupby([df.index.strftime('%m%d%H')]).mean())
Temperature
090100 53.980952
090200 32.000000
或者如果需要仅按小时 groupby
按 DatetimeIndex.hour
:
print (df.index.hour)
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
print (df.groupby([df.index.hour]).mean())
Temperature
0 44.475676
我会首先定义一个新列 hour
以提高可读性,然后 groupBy
它
df = pd.DataFrame.from_csv('/home/kk/Desktop/cal_Avg.csv',index_col=None)
df['hour']=df['Timestamp'].apply(lambda s:s[:-3])
df[['hour','Temprature']].groupBy('hour').mean()
我想使用 CSV 文件计算每小时的平均值:
下面是我的数据集:
Timestamp Temperature
9/1/2016 0:00:08 53.8
9/1/2016 0:00:38 53.8
9/1/2016 0:01:08 53.8
9/1/2016 0:01:38 53.8
9/1/2016 0:02:08 53.8
9/1/2016 0:02:38 54.1
9/1/2016 0:03:08 54.1
9/1/2016 0:03:38 54.1
9/1/2016 0:04:38 54
9/1/2016 0:05:38 54
9/1/2016 0:06:08 54
9/1/2016 0:06:38 54
9/1/2016 0:07:08 54
9/1/2016 0:07:38 54
9/1/2016 0:08:08 54.1
9/1/2016 0:08:38 54.1
9/1/2016 0:09:38 54.1
9/1/2016 0:10:32 54
9/1/2016 0:11:02 54
9/1/2016 0:11:32 54
9/1/2016 0:00:08 54
9/2/2016 0:00:20 32
9/2/2016 0:00:50 32
9/2/2016 0:01:20 32
9/2/2016 0:01:50 32
9/2/2016 0:02:20 32
9/2/2016 0:02:50 32
9/2/2016 0:03:20 32
9/2/2016 0:03:50 32
9/2/2016 0:04:20 32
9/2/2016 0:04:50 32
9/2/2016 0:05:20 32
9/2/2016 0:05:50 32
9/2/2016 0:06:20 32
9/2/2016 0:06:50 32
9/2/2016 0:07:20 32
9/2/2016 0:07:50 32
这是我计算每天平均值的代码,但我想要每小时:
from datetime import datetime
import pandas
def same_day(date_string): # Remove year
return datetime.strptime(date_string, "%m/%d/%Y %H:%M%S").strftime(%m%d')
df = pandas.read_csv('/home/kk/Desktop/cal_Avg.csv',index_col=0,usecols=[0, 1], names=['Timestamp', 'Discharge'],converters={'Timestamp': same_day})
print(df.groupby(level=0).mean())
我想要的输出是这样的:
Timestamp Temp * Avg
9/1/2016 0:00:08 53.8
9/1/2016 0:00:38 53.8 ?avg for this hour
9/1/2016 0:01:08 53.8
9/1/2016 0:01:38 53.8 ?avg for this hour
9/1/2016 0:02:08 53.8
9/1/2016 0:02:38 54.1
现在我想要特定时间的平均值,最小值
期望的输出:
这里我只打印日期 01-09-2016 和 02-09-16 的 5 小时输出
010900 54.362727 45.497273
010901 54.723276 45.068103
010902 54.746847 45.370270
010903 54.833913 44.931304
010904 54.971053 44.835088
010905 55.519444 44.459259
020901 31.742553 55.640426
020902 31.495556 55.655556
020903 31.304348 55.442609
020904 31.200000 55.437273
020905 31.294382 55.442697
具体日期还有具体时间? 我该如何存档?
我想你首先需要 read_csv
和参数 index_col=[0]
来读取第一列到 index
和 parse_dates=[0]
来解析第一列到 DatetimeIndex
:
df = pd.read_csv('filename', index_col=[0], parse_dates=[0],, usecols=[0,1])
print (df)
Temperature
Timestamp
2016-09-01 00:00:08 53.8
2016-09-01 00:00:38 53.8
2016-09-01 00:01:08 53.8
2016-09-01 00:01:38 53.8
2016-09-01 00:02:08 53.8
2016-09-01 00:02:38 54.1
2016-09-01 00:03:08 54.1
...
...
然后使用resample
by hours
and aggregate Resampler.mean
,但是在DatetimeIndex
中缺少数据得到NaN
:
print (df.resample('H').mean())
Temperature
Timestamp
2016-09-01 00:00:00 53.980952
2016-09-01 01:00:00 NaN
2016-09-01 02:00:00 NaN
2016-09-01 03:00:00 NaN
2016-09-01 04:00:00 NaN
2016-09-01 05:00:00 NaN
2016-09-01 06:00:00 NaN
2016-09-01 07:00:00 NaN
2016-09-01 08:00:00 NaN
2016-09-01 09:00:00 NaN
2016-09-01 10:00:00 NaN
2016-09-01 11:00:00 NaN
2016-09-01 12:00:00 NaN
2016-09-01 13:00:00 NaN
2016-09-01 14:00:00 NaN
2016-09-01 15:00:00 NaN
2016-09-01 16:00:00 NaN
2016-09-01 17:00:00 NaN
2016-09-01 18:00:00 NaN
2016-09-01 19:00:00 NaN
2016-09-01 20:00:00 NaN
2016-09-01 21:00:00 NaN
2016-09-01 22:00:00 NaN
2016-09-01 23:00:00 NaN
2016-09-02 00:00:00 32.000000
另一种解决方案是删除 minutes
和 seconds
通过转换为 hours
和 groupby
array
:
print (df.index.values.astype('<M8[h]'))
['2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
'2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
'2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
'2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
'2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
'2016-09-01T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
'2016-09-02T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
'2016-09-02T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
'2016-09-02T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
'2016-09-02T00']
print (df.groupby([df.index.values.astype('<M8[h]')]).mean())
Temperature
2016-09-01 53.980952
2016-09-02 32.000000
此外,如果需要按月、日和小时进行平均,则可以 groupby
按 DatetimeIndex.strftime
:
print (df.index.strftime('%m%d%H'))
['090100' '090100' '090100' '090100' '090100' '090100' '090100' '090100'
'090100' '090100' '090100' '090100' '090100' '090100' '090100' '090100'
'090100' '090100' '090100' '090100' '090100' '090200' '090200' '090200'
'090200' '090200' '090200' '090200' '090200' '090200' '090200' '090200'
'090200' '090200' '090200' '090200' '090200']
print (df.groupby([df.index.strftime('%m%d%H')]).mean())
Temperature
090100 53.980952
090200 32.000000
或者如果需要仅按小时 groupby
按 DatetimeIndex.hour
:
print (df.index.hour)
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
print (df.groupby([df.index.hour]).mean())
Temperature
0 44.475676
我会首先定义一个新列 hour
以提高可读性,然后 groupBy
它
df = pd.DataFrame.from_csv('/home/kk/Desktop/cal_Avg.csv',index_col=None)
df['hour']=df['Timestamp'].apply(lambda s:s[:-3])
df[['hour','Temprature']].groupBy('hour').mean()