计算熊猫多列事件数据中的 5 天移动平均线并对其进行排序
Calculate & Rank 5-Day Moving Averages in Panda Multiple Column Event Data
我需要使用表征“事件”的多个 panda 列查找每小时数据从最高到最低的 5 天平均值并对其进行排名。我在 df 数据框中的数据如下所示。结果将显示连续 5 天平均正风、最低温度和最高相对湿度的排名连续日“事件”。我不确定如何计算连续 5 天的平均值,然后如何根据多个条件进行排名。谢谢!
site year month day wind temp rh
0 A 1991 1 1 5.3 2.1 80.4
1 A 1991 1 2 12.6 -1.4 85.0
2 A 1991 1 3 14.7 -2.6 95.1
3 A 1991 1 4 11.8 4.8 57.3
4 A 1991 1 5 5.2 2.9 45.9
5 A 1991 1 6 3.9 4.3 52.1
6 A 1991 1 7 2.6 5.8 34.7
7 A 1991 1 8 2.9 5.7 29.2
8 A 1991 1 9 10.4 1.4 69.4
9 A 1991 1 10 14.6 -0.9 72.1
10 A 1991 1 11 13.9 -1.6 84.6
11 A 1991 1 12 14.5 -5.1 87.2
12 A 1991 1 13 12.8 -6.7 80.9
13 A 1991 1 14 8.4 -4.3 54.3
14 A 1991 1 15 5.7 0.7 44.8
我试过像下面这样使用不同的滚动平均值选项,但出现“列表分配索引超出范围”错误:
df['rolling_wind','rolling_t','rolling_rh'] = df.groupby(['wind','temp','rh' ]).rolling(window=5).mean()
5 天滚动平均值应如下所示:
site year month day wind temp rh
0 A 1991 1 1 n/a n/a n/a
1 A 1991 1 2 n/a n/a n/a
2 A 1991 1 3 n/a n/a n/a
3 A 1991 1 4 n/a n/a n/a
4 A 1991 1 5 9.92 1.16 72.74
5 A 1991 1 6 9.64 1.6 67.08
6 A 1991 1 7 7.64 3.04 57.02
7 A 1991 1 8 5.28 4.7 43.84
8 A 1991 1 9 5 4.02 46.26
9 A 1991 1 10 6.88 3.26 51.5
10 A 1991 1 11 8.88 2.08 58
11 A 1991 1 12 11.26 -0.1 68.5
12 A 1991 1 13 13.24 -2.58 78.84
13 A 1991 1 14 12.84 -3.72 75.82
14 A 1991 1 15 11.06 -3.4 70.36
并且,最终输出应如下所示,风、温度、湿度的优先级顺序如下:
site year month day wind temp rh
0 A 1991 1 1 n/a n/a n/a
1 A 1991 1 2 n/a n/a n/a
2 A 1991 1 3 n/a n/a n/a
3 A 1991 1 4 n/a n/a n/a
12 A 1991 1 13 13.24 -2.58 78.84
13 A 1991 1 14 12.84 -3.72 75.82
11 A 1991 1 12 11.26 -0.1 68.5
14 A 1991 1 15 11.06 -3.4 70.36
4 A 1991 1 5 9.92 1.16 72.74
5 A 1991 1 6 9.64 1.6 67.08
10 A 1991 1 11 8.88 2.08 58
6 A 1991 1 7 7.64 3.04 57.02
9 A 1991 1 10 6.88 3.26 51.5
7 A 1991 1 8 5.28 4.7 43.84
8 A 1991 1 9 5 4.02 46.26
import pandas as pd
df = pd.read_csv(##csv_name##)
#print(df)
print("wind:",df["wind"].rolling(5).mean().round(1))
print("temp:",df["temp"].rolling(5).mean().round(1))
print("rh:",df["rh"].rolling(5).mean().round(1))
试试这个。谢谢!
>>> df[['wind','temp','rh']].rolling(5).mean().sort_values(['wind','temp','rh'], ascending=False)
wind temp rh
12 13.24 -2.58 78.84
13 12.84 -3.72 75.82
11 11.26 -0.10 68.50
14 11.06 -3.40 70.36
4 9.92 1.16 72.74
5 9.64 1.60 67.08
10 8.88 2.08 58.00
6 7.64 3.04 57.02
9 6.88 3.26 51.50
7 5.28 4.70 43.84
8 5.00 4.02 46.26
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
先尝试 rolling mean + sort_values,na_position:
import pandas as pd
d = {'site': {0: 'A', 1: 'A', 2: 'A', 3: 'A', 4: 'A', 5: 'A', 6: 'A', 7: 'A',
8: 'A', 9: 'A', 10: 'A', 11: 'A', 12: 'A', 13: 'A', 14: 'A'},
'year': {0: 1991, 1: 1991, 2: 1991, 3: 1991, 4: 1991, 5: 1991, 6: 1991,
7: 1991, 8: 1991, 9: 1991, 10: 1991, 11: 1991, 12: 1991, 13: 1991,
14: 1991},
'month': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1,
10: 1, 11: 1, 12: 1, 13: 1, 14: 1},
'day': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10,
10: 11, 11: 12, 12: 13, 13: 14, 14: 15},
'wind': {0: 5.3, 1: 12.6, 2: 14.7, 3: 11.8, 4: 5.2, 5: 3.9, 6: 2.6, 7: 2.9,
8: 10.4, 9: 14.6, 10: 13.9, 11: 14.5, 12: 12.8, 13: 8.4, 14: 5.7},
'temp': {0: 2.1, 1: -1.4, 2: -2.6, 3: 4.8, 4: 2.9, 5: 4.3, 6: 5.8, 7: 5.7,
8: 1.4, 9: -0.9, 10: -1.6, 11: -5.1, 12: -6.7, 13: -4.3, 14: 0.7},
'rh': {0: 80.4, 1: 85.0, 2: 95.1, 3: 57.3, 4: 45.9, 5: 52.1, 6: 34.7,
7: 29.2, 8: 69.4, 9: 72.1, 10: 84.6, 11: 87.2, 12: 80.9, 13: 54.3,
14: 44.8}}
df = pd.DataFrame(data=d)
cols = ['wind', 'temp', 'rh']
df[cols] = df[cols].rolling(window=5).mean()
df = df.sort_values(cols, ascending=False, na_position='first')
print(df)
df
:
site year month day wind temp rh
0 A 1991 1 1 NaN NaN NaN
1 A 1991 1 2 NaN NaN NaN
2 A 1991 1 3 NaN NaN NaN
3 A 1991 1 4 NaN NaN NaN
12 A 1991 1 13 13.24 -2.58 78.84
13 A 1991 1 14 12.84 -3.72 75.82
11 A 1991 1 12 11.26 -0.10 68.50
14 A 1991 1 15 11.06 -3.40 70.36
4 A 1991 1 5 9.92 1.16 72.74
5 A 1991 1 6 9.64 1.60 67.08
10 A 1991 1 11 8.88 2.08 58.00
6 A 1991 1 7 7.64 3.04 57.02
9 A 1991 1 10 6.88 3.26 51.50
7 A 1991 1 8 5.28 4.70 43.84
8 A 1991 1 9 5.00 4.02 46.26
我需要使用表征“事件”的多个 panda 列查找每小时数据从最高到最低的 5 天平均值并对其进行排名。我在 df 数据框中的数据如下所示。结果将显示连续 5 天平均正风、最低温度和最高相对湿度的排名连续日“事件”。我不确定如何计算连续 5 天的平均值,然后如何根据多个条件进行排名。谢谢!
site year month day wind temp rh
0 A 1991 1 1 5.3 2.1 80.4
1 A 1991 1 2 12.6 -1.4 85.0
2 A 1991 1 3 14.7 -2.6 95.1
3 A 1991 1 4 11.8 4.8 57.3
4 A 1991 1 5 5.2 2.9 45.9
5 A 1991 1 6 3.9 4.3 52.1
6 A 1991 1 7 2.6 5.8 34.7
7 A 1991 1 8 2.9 5.7 29.2
8 A 1991 1 9 10.4 1.4 69.4
9 A 1991 1 10 14.6 -0.9 72.1
10 A 1991 1 11 13.9 -1.6 84.6
11 A 1991 1 12 14.5 -5.1 87.2
12 A 1991 1 13 12.8 -6.7 80.9
13 A 1991 1 14 8.4 -4.3 54.3
14 A 1991 1 15 5.7 0.7 44.8
我试过像下面这样使用不同的滚动平均值选项,但出现“列表分配索引超出范围”错误:
df['rolling_wind','rolling_t','rolling_rh'] = df.groupby(['wind','temp','rh' ]).rolling(window=5).mean()
5 天滚动平均值应如下所示:
site year month day wind temp rh
0 A 1991 1 1 n/a n/a n/a
1 A 1991 1 2 n/a n/a n/a
2 A 1991 1 3 n/a n/a n/a
3 A 1991 1 4 n/a n/a n/a
4 A 1991 1 5 9.92 1.16 72.74
5 A 1991 1 6 9.64 1.6 67.08
6 A 1991 1 7 7.64 3.04 57.02
7 A 1991 1 8 5.28 4.7 43.84
8 A 1991 1 9 5 4.02 46.26
9 A 1991 1 10 6.88 3.26 51.5
10 A 1991 1 11 8.88 2.08 58
11 A 1991 1 12 11.26 -0.1 68.5
12 A 1991 1 13 13.24 -2.58 78.84
13 A 1991 1 14 12.84 -3.72 75.82
14 A 1991 1 15 11.06 -3.4 70.36
并且,最终输出应如下所示,风、温度、湿度的优先级顺序如下:
site year month day wind temp rh
0 A 1991 1 1 n/a n/a n/a
1 A 1991 1 2 n/a n/a n/a
2 A 1991 1 3 n/a n/a n/a
3 A 1991 1 4 n/a n/a n/a
12 A 1991 1 13 13.24 -2.58 78.84
13 A 1991 1 14 12.84 -3.72 75.82
11 A 1991 1 12 11.26 -0.1 68.5
14 A 1991 1 15 11.06 -3.4 70.36
4 A 1991 1 5 9.92 1.16 72.74
5 A 1991 1 6 9.64 1.6 67.08
10 A 1991 1 11 8.88 2.08 58
6 A 1991 1 7 7.64 3.04 57.02
9 A 1991 1 10 6.88 3.26 51.5
7 A 1991 1 8 5.28 4.7 43.84
8 A 1991 1 9 5 4.02 46.26
import pandas as pd
df = pd.read_csv(##csv_name##)
#print(df)
print("wind:",df["wind"].rolling(5).mean().round(1))
print("temp:",df["temp"].rolling(5).mean().round(1))
print("rh:",df["rh"].rolling(5).mean().round(1))
试试这个。谢谢!
>>> df[['wind','temp','rh']].rolling(5).mean().sort_values(['wind','temp','rh'], ascending=False)
wind temp rh
12 13.24 -2.58 78.84
13 12.84 -3.72 75.82
11 11.26 -0.10 68.50
14 11.06 -3.40 70.36
4 9.92 1.16 72.74
5 9.64 1.60 67.08
10 8.88 2.08 58.00
6 7.64 3.04 57.02
9 6.88 3.26 51.50
7 5.28 4.70 43.84
8 5.00 4.02 46.26
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
先尝试 rolling mean + sort_values,na_position:
import pandas as pd
d = {'site': {0: 'A', 1: 'A', 2: 'A', 3: 'A', 4: 'A', 5: 'A', 6: 'A', 7: 'A',
8: 'A', 9: 'A', 10: 'A', 11: 'A', 12: 'A', 13: 'A', 14: 'A'},
'year': {0: 1991, 1: 1991, 2: 1991, 3: 1991, 4: 1991, 5: 1991, 6: 1991,
7: 1991, 8: 1991, 9: 1991, 10: 1991, 11: 1991, 12: 1991, 13: 1991,
14: 1991},
'month': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1,
10: 1, 11: 1, 12: 1, 13: 1, 14: 1},
'day': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10,
10: 11, 11: 12, 12: 13, 13: 14, 14: 15},
'wind': {0: 5.3, 1: 12.6, 2: 14.7, 3: 11.8, 4: 5.2, 5: 3.9, 6: 2.6, 7: 2.9,
8: 10.4, 9: 14.6, 10: 13.9, 11: 14.5, 12: 12.8, 13: 8.4, 14: 5.7},
'temp': {0: 2.1, 1: -1.4, 2: -2.6, 3: 4.8, 4: 2.9, 5: 4.3, 6: 5.8, 7: 5.7,
8: 1.4, 9: -0.9, 10: -1.6, 11: -5.1, 12: -6.7, 13: -4.3, 14: 0.7},
'rh': {0: 80.4, 1: 85.0, 2: 95.1, 3: 57.3, 4: 45.9, 5: 52.1, 6: 34.7,
7: 29.2, 8: 69.4, 9: 72.1, 10: 84.6, 11: 87.2, 12: 80.9, 13: 54.3,
14: 44.8}}
df = pd.DataFrame(data=d)
cols = ['wind', 'temp', 'rh']
df[cols] = df[cols].rolling(window=5).mean()
df = df.sort_values(cols, ascending=False, na_position='first')
print(df)
df
:
site year month day wind temp rh
0 A 1991 1 1 NaN NaN NaN
1 A 1991 1 2 NaN NaN NaN
2 A 1991 1 3 NaN NaN NaN
3 A 1991 1 4 NaN NaN NaN
12 A 1991 1 13 13.24 -2.58 78.84
13 A 1991 1 14 12.84 -3.72 75.82
11 A 1991 1 12 11.26 -0.10 68.50
14 A 1991 1 15 11.06 -3.40 70.36
4 A 1991 1 5 9.92 1.16 72.74
5 A 1991 1 6 9.64 1.60 67.08
10 A 1991 1 11 8.88 2.08 58.00
6 A 1991 1 7 7.64 3.04 57.02
9 A 1991 1 10 6.88 3.26 51.50
7 A 1991 1 8 5.28 4.70 43.84
8 A 1991 1 9 5.00 4.02 46.26