计算熊猫多列事件数据中的 5 天移动平均线并对其进行排序

Calculate & Rank 5-Day Moving Averages in Panda Multiple Column Event Data

我需要使用表征“事件”的多个 panda 列查找每小时数据从最高到最低的 5 天平均值并对其进行排名。我在 df 数据框中的数据如下所示。结果将显示连续 5 天平均正风、最低温度和最高相对湿度的排名连续日“事件”。我不确定如何计算连续 5 天的平均值,然后如何根据多个条件进行排名。谢谢!

     site     year     month     day     wind    temp    rh 
0    A        1991     1         1       5.3     2.1     80.4
1    A        1991     1         2       12.6    -1.4    85.0
2    A        1991     1         3       14.7    -2.6    95.1   
3    A        1991     1         4       11.8     4.8    57.3
4    A        1991     1         5       5.2     2.9     45.9
5    A        1991     1         6       3.9     4.3     52.1
6    A        1991     1         7       2.6     5.8     34.7
7    A        1991     1         8       2.9     5.7     29.2
8    A        1991     1         9       10.4    1.4     69.4
9    A        1991     1         10      14.6    -0.9    72.1
10   A        1991     1         11      13.9    -1.6    84.6
11   A        1991     1         12      14.5    -5.1    87.2
12   A        1991     1         13      12.8    -6.7    80.9
13   A        1991     1         14      8.4     -4.3    54.3
14   A        1991     1         15      5.7     0.7     44.8

我试过像下面这样使用不同的滚动平均值选项,但出现“列表分配索引超出范围”错误:

df['rolling_wind','rolling_t','rolling_rh'] = df.groupby(['wind','temp','rh' ]).rolling(window=5).mean()

5 天滚动平均值应如下所示:

    site    year    month   day wind    temp    rh
0   A   1991    1           1   n/a      n/a    n/a
1   A   1991    1           2   n/a      n/a    n/a
2   A   1991    1           3   n/a      n/a    n/a
3   A   1991    1           4   n/a      n/a    n/a
4   A   1991    1           5   9.92    1.16    72.74
5   A   1991    1           6   9.64    1.6     67.08
6   A   1991    1           7   7.64    3.04    57.02
7   A   1991    1           8   5.28    4.7     43.84
8   A   1991    1           9   5       4.02    46.26
9   A   1991    1           10  6.88    3.26    51.5
10  A   1991    1           11  8.88    2.08    58
11  A   1991    1           12  11.26   -0.1    68.5
12  A   1991    1           13  13.24   -2.58   78.84
13  A   1991    1           14  12.84   -3.72   75.82
14  A   1991    1           15  11.06   -3.4    70.36

并且,最终输出应如下所示,风、温度、湿度的优先级顺序如下:

    site    year    month   day wind    temp    rh
0   A   1991    1           1   n/a     n/a     n/a
1   A   1991    1           2   n/a     n/a     n/a
2   A   1991    1           3   n/a     n/a     n/a
3   A   1991    1           4   n/a     n/a     n/a
12  A   1991    1           13  13.24   -2.58   78.84
13  A   1991    1           14  12.84   -3.72   75.82
11  A   1991    1           12  11.26   -0.1    68.5
14  A   1991    1           15  11.06   -3.4    70.36
4   A   1991    1           5   9.92    1.16    72.74
5   A   1991    1           6   9.64    1.6     67.08
10  A   1991    1           11  8.88    2.08    58
6   A   1991    1           7   7.64    3.04    57.02
9   A   1991    1           10  6.88    3.26    51.5
7   A   1991    1           8   5.28    4.7     43.84
8   A   1991    1           9   5       4.02    46.26
import pandas as pd
df =  pd.read_csv(##csv_name##)
#print(df)
print("wind:",df["wind"].rolling(5).mean().round(1))
print("temp:",df["temp"].rolling(5).mean().round(1))
print("rh:",df["rh"].rolling(5).mean().round(1))

试试这个。谢谢!

>>> df[['wind','temp','rh']].rolling(5).mean().sort_values(['wind','temp','rh'], ascending=False)

     wind  temp     rh
12  13.24 -2.58  78.84
13  12.84 -3.72  75.82
11  11.26 -0.10  68.50
14  11.06 -3.40  70.36
4    9.92  1.16  72.74
5    9.64  1.60  67.08
10   8.88  2.08  58.00
6    7.64  3.04  57.02
9    6.88  3.26  51.50
7    5.28  4.70  43.84
8    5.00  4.02  46.26
0     NaN   NaN    NaN
1     NaN   NaN    NaN
2     NaN   NaN    NaN
3     NaN   NaN    NaN

先尝试 rolling mean + sort_values,na_position:

import pandas as pd

d = {'site': {0: 'A', 1: 'A', 2: 'A', 3: 'A', 4: 'A', 5: 'A', 6: 'A', 7: 'A',
              8: 'A', 9: 'A', 10: 'A', 11: 'A', 12: 'A', 13: 'A', 14: 'A'},
     'year': {0: 1991, 1: 1991, 2: 1991, 3: 1991, 4: 1991, 5: 1991, 6: 1991,
              7: 1991, 8: 1991, 9: 1991, 10: 1991, 11: 1991, 12: 1991, 13: 1991,
              14: 1991},
     'month': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1,
               10: 1, 11: 1, 12: 1, 13: 1, 14: 1},
     'day': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10,
             10: 11, 11: 12, 12: 13, 13: 14, 14: 15},
     'wind': {0: 5.3, 1: 12.6, 2: 14.7, 3: 11.8, 4: 5.2, 5: 3.9, 6: 2.6, 7: 2.9,
              8: 10.4, 9: 14.6, 10: 13.9, 11: 14.5, 12: 12.8, 13: 8.4, 14: 5.7},
     'temp': {0: 2.1, 1: -1.4, 2: -2.6, 3: 4.8, 4: 2.9, 5: 4.3, 6: 5.8, 7: 5.7,
              8: 1.4, 9: -0.9, 10: -1.6, 11: -5.1, 12: -6.7, 13: -4.3, 14: 0.7},
     'rh': {0: 80.4, 1: 85.0, 2: 95.1, 3: 57.3, 4: 45.9, 5: 52.1, 6: 34.7,
            7: 29.2, 8: 69.4, 9: 72.1, 10: 84.6, 11: 87.2, 12: 80.9, 13: 54.3,
            14: 44.8}}
df = pd.DataFrame(data=d)

cols = ['wind', 'temp', 'rh']

df[cols] = df[cols].rolling(window=5).mean()

df = df.sort_values(cols, ascending=False, na_position='first')

print(df)

df:

   site  year  month  day   wind  temp     rh
0     A  1991      1    1    NaN   NaN    NaN
1     A  1991      1    2    NaN   NaN    NaN
2     A  1991      1    3    NaN   NaN    NaN
3     A  1991      1    4    NaN   NaN    NaN
12    A  1991      1   13  13.24 -2.58  78.84
13    A  1991      1   14  12.84 -3.72  75.82
11    A  1991      1   12  11.26 -0.10  68.50
14    A  1991      1   15  11.06 -3.40  70.36
4     A  1991      1    5   9.92  1.16  72.74
5     A  1991      1    6   9.64  1.60  67.08
10    A  1991      1   11   8.88  2.08  58.00
6     A  1991      1    7   7.64  3.04  57.02
9     A  1991      1   10   6.88  3.26  51.50
7     A  1991      1    8   5.28  4.70  43.84
8     A  1991      1    9   5.00  4.02  46.26