如何根据列中的值创建一列递增的字符串标签?
How to create a column of incrementing string labels based on values in a column?
我有一个包含三个时间序列数据列的数据框。我想根据其中一列递增的二进制值向数据框添加标签。下面是我想要的输出的演示('Rate labels' 基于 'Rate pct change' 构建)。我用 excel 做了这个,但想用 python.
做
Rate pct change Sharpe Ratio PCA long_only_ew Rate_labels
28/06/2019 -1 0.000000 0.024448 neg rates 1
01/07/2019 1 0.000000 0.030487 pos rates 1
02/07/2019 -1 0.000000 0.036835 neg rates 2
03/07/2019 -1 0.000000 0.054662 neg rates 2
05/07/2019 -1 0.000000 0.055340 neg rates 2
08/07/2019 1 0.000000 0.050585 pos rates 2
09/07/2019 1 0.000000 0.059735 pos rates 2
10/07/2019 1 0.000000 0.064335 pos rates 2
11/07/2019 1 0.000000 0.066124 pos rates 2
12/07/2019 1 -0.002202 0.072657 pos rates 2
15/07/2019 -1 0.003897 0.074136 neg rates 3
16/07/2019 -1 0.003278 0.071436 neg rates 3
17/07/2019 1 0.012141 0.072065 pos rates 3
18/07/2019 1 0.007214 0.074099 pos rates 3
19/07/2019 1 0.006617 0.073397 pos rates 3
22/07/2019 1 0.009760 0.078266 pos rates 3
23/07/2019 1 0.003645 0.075539 pos rates 3
24/07/2019 1 0.016116 0.085452 pos rates 3
25/07/2019 1 0.007491 0.075281 pos rates 3
26/07/2019 1 0.016323 0.090989 pos rates 3
29/07/2019 1 0.011050 0.077088 pos rates 3
30/07/2019 1 0.011531 0.073027 pos rates 3
利率 pct 变化(利率)是根据(利率 > 0 == 1)和(利率 <0 == -1)得出的
我将如何使用 python 创建 'Rate_labels'?
尝试使用 shift
和 cumsum
的组合:
df['Rate_labels'] = df['Rate pct change'].lt(0).map({True:'neg', False:'pos'}) + ' rates ' + (df['Rate pct change'].shift(1).ne(df['Rate pct change']) & df['Rate pct change'].eq(-1)).cumsum().astype(str)
输出:
>>> df
Rate pct change Sharpe Ratio PCA long_only_ew Rate_labels
28/06/2019 -1 0.000000 0.024448 neg rates 1
01/07/2019 1 0.000000 0.030487 pos rates 1
02/07/2019 -1 0.000000 0.036835 neg rates 2
03/07/2019 -1 0.000000 0.054662 neg rates 2
05/07/2019 -1 0.000000 0.055340 neg rates 2
08/07/2019 1 0.000000 0.050585 pos rates 2
09/07/2019 1 0.000000 0.059735 pos rates 2
10/07/2019 1 0.000000 0.064335 pos rates 2
11/07/2019 1 0.000000 0.066124 pos rates 2
12/07/2019 1 -0.002202 0.072657 pos rates 2
15/07/2019 -1 0.003897 0.074136 neg rates 3
16/07/2019 -1 0.003278 0.071436 neg rates 3
17/07/2019 1 0.012141 0.072065 pos rates 3
18/07/2019 1 0.007214 0.074099 pos rates 3
19/07/2019 1 0.006617 0.073397 pos rates 3
22/07/2019 1 0.009760 0.078266 pos rates 3
23/07/2019 1 0.003645 0.075539 pos rates 3
24/07/2019 1 0.016116 0.085452 pos rates 3
25/07/2019 1 0.007491 0.075281 pos rates 3
26/07/2019 1 0.016323 0.090989 pos rates 3
29/07/2019 1 0.011050 0.077088 pos rates 3
30/07/2019 1 0.011531 0.073027 pos rates 3
@richardec 的 one liner 很棒,我就是喜欢它,非常 pythonist。
如果您需要其他观点,我就是这样做的:
df.loc[df.Rate < 0, 'Rate_lab'] = 'neg rates'
df.loc[df.Rate >= 0, 'Rate_lab'] = 'pos rates'
df['cmp'] = (df['Rate_lab'].ne(df.Rate_lab.shift()) & df['Rate'].eq(-1)).cumsum()
df['Rate_labels'] = df['Rate_lab'] + ' ' + df['cmp'].astype(str)
df = df.drop(['Rate_lab', 'cmp'], axis=1)
我有一个包含三个时间序列数据列的数据框。我想根据其中一列递增的二进制值向数据框添加标签。下面是我想要的输出的演示('Rate labels' 基于 'Rate pct change' 构建)。我用 excel 做了这个,但想用 python.
做 Rate pct change Sharpe Ratio PCA long_only_ew Rate_labels
28/06/2019 -1 0.000000 0.024448 neg rates 1
01/07/2019 1 0.000000 0.030487 pos rates 1
02/07/2019 -1 0.000000 0.036835 neg rates 2
03/07/2019 -1 0.000000 0.054662 neg rates 2
05/07/2019 -1 0.000000 0.055340 neg rates 2
08/07/2019 1 0.000000 0.050585 pos rates 2
09/07/2019 1 0.000000 0.059735 pos rates 2
10/07/2019 1 0.000000 0.064335 pos rates 2
11/07/2019 1 0.000000 0.066124 pos rates 2
12/07/2019 1 -0.002202 0.072657 pos rates 2
15/07/2019 -1 0.003897 0.074136 neg rates 3
16/07/2019 -1 0.003278 0.071436 neg rates 3
17/07/2019 1 0.012141 0.072065 pos rates 3
18/07/2019 1 0.007214 0.074099 pos rates 3
19/07/2019 1 0.006617 0.073397 pos rates 3
22/07/2019 1 0.009760 0.078266 pos rates 3
23/07/2019 1 0.003645 0.075539 pos rates 3
24/07/2019 1 0.016116 0.085452 pos rates 3
25/07/2019 1 0.007491 0.075281 pos rates 3
26/07/2019 1 0.016323 0.090989 pos rates 3
29/07/2019 1 0.011050 0.077088 pos rates 3
30/07/2019 1 0.011531 0.073027 pos rates 3
利率 pct 变化(利率)是根据(利率 > 0 == 1)和(利率 <0 == -1)得出的
我将如何使用 python 创建 'Rate_labels'?
尝试使用 shift
和 cumsum
的组合:
df['Rate_labels'] = df['Rate pct change'].lt(0).map({True:'neg', False:'pos'}) + ' rates ' + (df['Rate pct change'].shift(1).ne(df['Rate pct change']) & df['Rate pct change'].eq(-1)).cumsum().astype(str)
输出:
>>> df
Rate pct change Sharpe Ratio PCA long_only_ew Rate_labels
28/06/2019 -1 0.000000 0.024448 neg rates 1
01/07/2019 1 0.000000 0.030487 pos rates 1
02/07/2019 -1 0.000000 0.036835 neg rates 2
03/07/2019 -1 0.000000 0.054662 neg rates 2
05/07/2019 -1 0.000000 0.055340 neg rates 2
08/07/2019 1 0.000000 0.050585 pos rates 2
09/07/2019 1 0.000000 0.059735 pos rates 2
10/07/2019 1 0.000000 0.064335 pos rates 2
11/07/2019 1 0.000000 0.066124 pos rates 2
12/07/2019 1 -0.002202 0.072657 pos rates 2
15/07/2019 -1 0.003897 0.074136 neg rates 3
16/07/2019 -1 0.003278 0.071436 neg rates 3
17/07/2019 1 0.012141 0.072065 pos rates 3
18/07/2019 1 0.007214 0.074099 pos rates 3
19/07/2019 1 0.006617 0.073397 pos rates 3
22/07/2019 1 0.009760 0.078266 pos rates 3
23/07/2019 1 0.003645 0.075539 pos rates 3
24/07/2019 1 0.016116 0.085452 pos rates 3
25/07/2019 1 0.007491 0.075281 pos rates 3
26/07/2019 1 0.016323 0.090989 pos rates 3
29/07/2019 1 0.011050 0.077088 pos rates 3
30/07/2019 1 0.011531 0.073027 pos rates 3
@richardec 的 one liner 很棒,我就是喜欢它,非常 pythonist。
如果您需要其他观点,我就是这样做的:
df.loc[df.Rate < 0, 'Rate_lab'] = 'neg rates'
df.loc[df.Rate >= 0, 'Rate_lab'] = 'pos rates'
df['cmp'] = (df['Rate_lab'].ne(df.Rate_lab.shift()) & df['Rate'].eq(-1)).cumsum()
df['Rate_labels'] = df['Rate_lab'] + ' ' + df['cmp'].astype(str)
df = df.drop(['Rate_lab', 'cmp'], axis=1)