如何根据列中的值创建一列递增的字符串标签?

How to create a column of incrementing string labels based on values in a column?

我有一个包含三个时间序列数据列的数据框。我想根据其中一列递增的二进制值向数据框添加标签。下面是我想要的输出的演示('Rate labels' 基于 'Rate pct change' 构建)。我用 excel 做了这个,但想用 python.

            Rate pct change      Sharpe Ratio PCA   long_only_ew    Rate_labels
28/06/2019        -1               0.000000             0.024448    neg rates 1
01/07/2019         1               0.000000             0.030487    pos rates 1
02/07/2019        -1               0.000000             0.036835    neg rates 2
03/07/2019        -1               0.000000             0.054662    neg rates 2
05/07/2019        -1               0.000000             0.055340    neg rates 2
08/07/2019         1               0.000000             0.050585    pos rates 2
09/07/2019         1               0.000000             0.059735    pos rates 2
10/07/2019         1               0.000000             0.064335    pos rates 2
11/07/2019         1               0.000000             0.066124    pos rates 2
12/07/2019         1              -0.002202             0.072657    pos rates 2
15/07/2019        -1               0.003897             0.074136    neg rates 3
16/07/2019        -1               0.003278             0.071436    neg rates 3
17/07/2019         1               0.012141             0.072065    pos rates 3
18/07/2019         1               0.007214             0.074099    pos rates 3
19/07/2019         1               0.006617             0.073397    pos rates 3
22/07/2019         1               0.009760             0.078266    pos rates 3
23/07/2019         1               0.003645             0.075539    pos rates 3
24/07/2019         1               0.016116             0.085452    pos rates 3
25/07/2019         1               0.007491             0.075281    pos rates 3
26/07/2019         1               0.016323             0.090989    pos rates 3
29/07/2019         1               0.011050             0.077088    pos rates 3
30/07/2019         1               0.011531             0.073027    pos rates 3

利率 pct 变化(利率)是根据(利率 > 0 == 1)和(利率 <0 == -1)得出的

我将如何使用 python 创建 'Rate_labels'?

尝试使用 shiftcumsum 的组合:

df['Rate_labels'] = df['Rate pct change'].lt(0).map({True:'neg', False:'pos'}) + ' rates ' + (df['Rate pct change'].shift(1).ne(df['Rate pct change']) & df['Rate pct change'].eq(-1)).cumsum().astype(str)

输出:

>>> df
            Rate pct change  Sharpe Ratio PCA  long_only_ew  Rate_labels
28/06/2019               -1          0.000000      0.024448  neg rates 1
01/07/2019                1          0.000000      0.030487  pos rates 1
02/07/2019               -1          0.000000      0.036835  neg rates 2
03/07/2019               -1          0.000000      0.054662  neg rates 2
05/07/2019               -1          0.000000      0.055340  neg rates 2
08/07/2019                1          0.000000      0.050585  pos rates 2
09/07/2019                1          0.000000      0.059735  pos rates 2
10/07/2019                1          0.000000      0.064335  pos rates 2
11/07/2019                1          0.000000      0.066124  pos rates 2
12/07/2019                1         -0.002202      0.072657  pos rates 2
15/07/2019               -1          0.003897      0.074136  neg rates 3
16/07/2019               -1          0.003278      0.071436  neg rates 3
17/07/2019                1          0.012141      0.072065  pos rates 3
18/07/2019                1          0.007214      0.074099  pos rates 3
19/07/2019                1          0.006617      0.073397  pos rates 3
22/07/2019                1          0.009760      0.078266  pos rates 3
23/07/2019                1          0.003645      0.075539  pos rates 3
24/07/2019                1          0.016116      0.085452  pos rates 3
25/07/2019                1          0.007491      0.075281  pos rates 3
26/07/2019                1          0.016323      0.090989  pos rates 3
29/07/2019                1          0.011050      0.077088  pos rates 3
30/07/2019                1          0.011531      0.073027  pos rates 3

@richardec 的 one liner 很棒,我就是喜欢它,非常 pythonist。
如果您需要其他观点,我就是这样做的:

df.loc[df.Rate < 0, 'Rate_lab'] = 'neg rates' 
df.loc[df.Rate >= 0, 'Rate_lab'] = 'pos rates'
df['cmp'] = (df['Rate_lab'].ne(df.Rate_lab.shift()) & df['Rate'].eq(-1)).cumsum()
df['Rate_labels'] = df['Rate_lab'] + ' ' + df['cmp'].astype(str)
df = df.drop(['Rate_lab', 'cmp'], axis=1)