如何根据 pandas DataFrame 中的条件从当前行值中减去上一行值?

how to subtract previous row value from current row value based on condition in pandas DataFrame?

在执行条件减法时,对于 ID 列中的值不等于 1 或 8 的那些行,我从列的当前行值中减去前一行值 (Jan - AnnualMean)。

怎么做?我尝试了以下,导致语法无效

for index,row in df.iterrows():
    if row["ID"] =! [1,8]:
        row.transform('diff')

输入

ID  Jan Feb Mrz Apr Mai Jun Jul Aug Sep Okt Nov Dez AnnualMean
1   14  18  17  45  22  31  30  4   22  26  12  48  24
2   8   35  21  31  50  49  20  29  17  49  17  3   27
3   19  4   8   27  29  37  25  7   2   2   49  28  20
4   3   50  49  20  17  29  35  39  8   42  41  34  31
5   33  2   12  15  49  49  46  25  39  11  42  38  30
6   12  19  14  38  8   42  5   34  36  29  12  50  25
7   16  48  29  14  41  6   9   3   4   33  12  4   18
8   25  24  4   26  7   45  17  2   47  17  19  3   20
9   47  36  34  24  17  45  3   32  27  15  46  49  31
10  50  15  42  45  13  9   31  10  49  1   30  37  28
1   22  26  32  50  22  30  48  27  19  27  44  19  31
2   27  45  43  7   48  13  43  1   45  8   11  4   25
3   24  4   12  5   10  49  24  16  10  42  46  25  22
4   45  32  21  5   30  5   27  23  4   8   21  23  20
5   38  28  4   8   4   20  36  13  11  14  11  11  17
6   42  46  28  42  46  43  7   8   40  30  33  1   31
7   42  11  37  33  16  27  9   23  42  40  29  35  29
8   40  27  45  24  28  34  4   10  28  16  41  27  27
9   4   4   1   6   8   34  43  48  10  10  37  29  20
10  39  17  18  23  27  32  14  15  8   45  28  40  26

期望输出:

    ID  Jan Feb Mrz Apr Mai Jun Jul Aug Sep Okt Nov Dez AnnualMean
1   14  18  17  45  22  31  30  4   22  26  12  48  24
2   -6  17  4   -14 28  18  -10 25  -5  23  5   -45 3
3   11  -31 -13 -4  -21 -12 5   -22 -15 -47 32  25  -7
4   -16 46  41  -7  -12 -8  10  32  6   40  -8  6   11
5   30  -48 -37 -5  32  20  11  -14 31  -31 1   4   -1
6   -21 17  2   23  -41 -7  -41 9   -3  18  -30 12  -5
7   4   29  15  -24 33  -36 4   -31 -32 4   0   -46 -7
8   25  24  4   26  7   45  17  2   47  17  19  3   20
9   22  12  30  -2  10  0   -14 30  -20 -2  27  46  11
10  3   -21 8   21  -4  -36 28  -22 22  -14 -16 -12 -3
1   22  26  32  50  22  30  48  27  19  27  44  19  31
2   5   19  11  -43 26  -17 -5  -26 26  -19 -33 -15 -6
3   -3  -41 -31 -2  -38 36  -19 15  -35 34  35  21  -3
4   21  28  9   0   20  -44 3   7   -6  -34 -25 -2  -2
5   -7  -4  -17 3   -26 15  9   -10 7   6   -10 -12 -3
6   4   18  24  34  42  23  -29 -5  29  16  22  -10 14
7   0   -35 9   -9  -30 -16 2   15  2   10  -4  34  -2
8   40  27  45  24  28  34  4   10  28  16  41  27  27
9   -36 -23 -44 -18 -20 0   39  38  -18 -6  -4  2   -7
10  35  13  17  17  19  -2  -29 -33 -2  35  -9  11  6

更新

每次 ID 为 1 或 8 时,您必须创建一个组,然后对每个组应用 diff。最后,用原始数据为每个 ID 1 和 8 填充值:

idx = pd.IndexSlice[:, 'Jan':'AnnualMean']
df.loc[idx] = df.loc[idx].groupby(df['ID'].isin([1, 8]).cumsum()) \
                         .diff().fillna(df).astype(int)

输出:

>>> df
    ID  Jan  Feb  Mrz  Apr  Mai  Jun  Jul  Aug  Sep  Okt  Nov  Dez  AnnualMean
0    1   14   18   17   45   22   31   30    4   22   26   12   48          24
1    2   -6   17    4  -14   28   18  -10   25   -5   23    5  -45           3
2    3   11  -31  -13   -4  -21  -12    5  -22  -15  -47   32   25          -7
3    4  -16   46   41   -7  -12   -8   10   32    6   40   -8    6          11
4    5   30  -48  -37   -5   32   20   11  -14   31  -31    1    4          -1
5    6  -21   17    2   23  -41   -7  -41    9   -3   18  -30   12          -5
6    7    4   29   15  -24   33  -36    4  -31  -32    4    0  -46          -7
7    8   25   24    4   26    7   45   17    2   47   17   19    3          20
8    9   22   12   30   -2   10    0  -14   30  -20   -2   27   46          11
9   10    3  -21    8   21   -4  -36   28  -22   22  -14  -16  -12          -3
10   1   22   26   32   50   22   30   48   27   19   27   44   19          31
11   2    5   19   11  -43   26  -17   -5  -26   26  -19  -33  -15          -6
12   3   -3  -41  -31   -2  -38   36  -19   15  -35   34   35   21          -3
13   4   21   28    9    0   20  -44    3    7   -6  -34  -25   -2          -2
14   5   -7   -4  -17    3  -26   15    9  -10    7    6  -10  -12          -3
15   6    4   18   24   34   42   23  -29   -5   29   16   22  -10          14
16   7    0  -35    9   -9  -30  -16    2   15    2   10   -4   34          -2
17   8   40   27   45   24   28   34    4   10   28   16   41   27          27
18   9  -36  -23  -44  -18  -20    0   39   38  -18   -6   -4    2          -7
19  10   35   13   17   17   19   -2  -29  -33   -2   35   -9   11           6

旧答案

使用where:

>>> df.set_index('ID').where(~df['ID'].between(1, 8), other=df.set_index('ID').diff()).reset_index().fillna(df)

    ID   Jan   Feb   Mrz   Apr   Mai   Jun   Jul   Aug   Sep   Okt   Nov   Dez  AnnualMean
0    1  14.0  18.0  17.0  45.0  22.0  31.0  30.0   4.0  22.0  26.0  12.0  48.0        24.0
1    2  -6.0  17.0   4.0 -14.0  28.0  18.0 -10.0  25.0  -5.0  23.0   5.0 -45.0         3.0
2    3  11.0 -31.0 -13.0  -4.0 -21.0 -12.0   5.0 -22.0 -15.0 -47.0  32.0  25.0        -7.0
3    4 -16.0  46.0  41.0  -7.0 -12.0  -8.0  10.0  32.0   6.0  40.0  -8.0   6.0        11.0
4    5  30.0 -48.0 -37.0  -5.0  32.0  20.0  11.0 -14.0  31.0 -31.0   1.0   4.0        -1.0
5    6 -21.0  17.0   2.0  23.0 -41.0  -7.0 -41.0   9.0  -3.0  18.0 -30.0  12.0        -5.0
6    7   4.0  29.0  15.0 -24.0  33.0 -36.0   4.0 -31.0 -32.0   4.0   0.0 -46.0        -7.0
7    8  25.0  24.0   4.0  26.0   7.0  45.0  17.0   2.0  47.0  17.0  19.0   3.0        20.0
8    9  47.0  36.0  34.0  24.0  17.0  45.0   3.0  32.0  27.0  15.0  46.0  49.0        31.0
9   10   3.0 -21.0   8.0  21.0  -4.0 -36.0  28.0 -22.0  22.0 -14.0 -16.0 -12.0        -3.0
10   1 -28.0  11.0 -10.0   5.0   9.0  21.0  17.0  17.0 -30.0  26.0  14.0 -18.0         3.0
11   2   5.0  19.0  11.0 -43.0  26.0 -17.0  -5.0 -26.0  26.0 -19.0 -33.0 -15.0        -6.0
12   3  -3.0 -41.0 -31.0  -2.0 -38.0  36.0 -19.0  15.0 -35.0  34.0  35.0  21.0        -3.0
13   4  21.0  28.0   9.0   0.0  20.0 -44.0   3.0   7.0  -6.0 -34.0 -25.0  -2.0        -2.0
14   5  -7.0  -4.0 -17.0   3.0 -26.0  15.0   9.0 -10.0   7.0   6.0 -10.0 -12.0        -3.0
15   6   4.0  18.0  24.0  34.0  42.0  23.0 -29.0  -5.0  29.0  16.0  22.0 -10.0        14.0
16   7   0.0 -35.0   9.0  -9.0 -30.0 -16.0   2.0  15.0   2.0  10.0  -4.0  34.0        -2.0
17   8  40.0  27.0  45.0  24.0  28.0  34.0   4.0  10.0  28.0  16.0  41.0  27.0        27.0
18   9   4.0   4.0   1.0   6.0   8.0  34.0  43.0  48.0  10.0  10.0  37.0  29.0        20.0
19  10  35.0  13.0  17.0  17.0  19.0  -2.0 -29.0 -33.0  -2.0  35.0  -9.0  11.0         6.0

更新

根据您的评论:

idx = pd.IndexSlice[:, 'Jan':'AnnualMean']
df.loc[idx] = df.loc[idx].where(~df['ID'].between(1, 8), other=df.loc[idx]).diff().fillna(df)

您可以使用 loc 到 select 只有 rows/columns 在其中应用差异:

rows = df['ID'].between(1,8, inclusive='neither')
cols = df.columns!='ID'
df.loc[rows, cols] = df.diff()

输出:

    ID   Jan   Feb   Mrz   Apr   Mai   Jun   Jul   Aug   Sep   Okt   Nov   Dez  AnnualMean
0    1  14.0  18.0  17.0  45.0  22.0  31.0  30.0   4.0  22.0  26.0  12.0  48.0        24.0
1    2  -6.0  17.0   4.0 -14.0  28.0  18.0 -10.0  25.0  -5.0  23.0   5.0 -45.0         3.0
2    3  11.0 -31.0 -13.0  -4.0 -21.0 -12.0   5.0 -22.0 -15.0 -47.0  32.0  25.0        -7.0
3    4 -16.0  46.0  41.0  -7.0 -12.0  -8.0  10.0  32.0   6.0  40.0  -8.0   6.0        11.0
4    5  30.0 -48.0 -37.0  -5.0  32.0  20.0  11.0 -14.0  31.0 -31.0   1.0   4.0        -1.0
5    6 -21.0  17.0   2.0  23.0 -41.0  -7.0 -41.0   9.0  -3.0  18.0 -30.0  12.0        -5.0
6    7   4.0  29.0  15.0 -24.0  33.0 -36.0   4.0 -31.0 -32.0   4.0   0.0 -46.0        -7.0
7    8  25.0  24.0   4.0  26.0   7.0  45.0  17.0   2.0  47.0  17.0  19.0   3.0        20.0
8    9  47.0  36.0  34.0  24.0  17.0  45.0   3.0  32.0  27.0  15.0  46.0  49.0        31.0
9   10  50.0  15.0  42.0  45.0  13.0   9.0  31.0  10.0  49.0   1.0  30.0  37.0        28.0
10   1  22.0  26.0  32.0  50.0  22.0  30.0  48.0  27.0  19.0  27.0  44.0  19.0        31.0
11   2   5.0  19.0  11.0 -43.0  26.0 -17.0  -5.0 -26.0  26.0 -19.0 -33.0 -15.0        -6.0
12   3  -3.0 -41.0 -31.0  -2.0 -38.0  36.0 -19.0  15.0 -35.0  34.0  35.0  21.0        -3.0
13   4  21.0  28.0   9.0   0.0  20.0 -44.0   3.0   7.0  -6.0 -34.0 -25.0  -2.0        -2.0
14   5  -7.0  -4.0 -17.0   3.0 -26.0  15.0   9.0 -10.0   7.0   6.0 -10.0 -12.0        -3.0
15   6   4.0  18.0  24.0  34.0  42.0  23.0 -29.0  -5.0  29.0  16.0  22.0 -10.0        14.0
16   7   0.0 -35.0   9.0  -9.0 -30.0 -16.0   2.0  15.0   2.0  10.0  -4.0  34.0        -2.0
17   8  40.0  27.0  45.0  24.0  28.0  34.0   4.0  10.0  28.0  16.0  41.0  27.0        27.0
18   9   4.0   4.0   1.0   6.0   8.0  34.0  43.0  48.0  10.0  10.0  37.0  29.0        20.0
19  10  39.0  17.0  18.0  23.0  27.0  32.0  14.0  15.0   8.0  45.0  28.0  40.0        26.0