如何根据 id 的 groupby 创建一个标记(1 - 体重减轻;0 - 相同体重 a)体重减轻(8% 或更多)的列?
How to create a column which flags (1 - weight loss;0 - same weight a) weight loss (8% or more) from previous measurement based on groupby of id?
我有一个数据框 df:
import pandas as pd
df = pd.DataFrame({"CLIENT_ID": [8222, 8222, 8222, 8222, 8300, 8300, 8300, 8300, 8300],
"ENCOUNTER_DATE": ['2020-01-01', '2020-03-02', '2020-04-18', '2020-07-31', '2017-06-10', '2017-09-11', '2018-02-01', '2018-04-01', '2018-05-31'],
"WEIGHT_KG": [56, 58, 50, 54, 71, 72, 74, 75, 65]})
按CLIENT_ID
和ENCOUNTER_DATE
排序
CLIENT_ID
ENCOUNTER_DATE
WEIGHT_KG
8222
2020-01-01
56
8222
2020-03-02
58
8222
2020-04-18
50
8222
2020-07-31
54
8300
2017-06-10
71
8300
2017-09-11
72
8300
2018-02-01
74
8300
2018-04-01
75
8300
2018-05-31
65
我想创建一个 WEIGHT_LOSS
标志列,如果当前 WEIGHT_KG
比之前的测量值至少低 10%,则为 1,否则为 0,对于每个 CLIENT_ID
导致下面的 table:
CLIENT_ID
ENCOUNTER_DATE
WEIGHT_KG
WEIGHT_LOSS
8222
2020-01-01
56
0
8222
2020-03-02
58
0
8222
2020-04-18
50
1
8222
2020-07-31
54
0
8300
2017-06-10
71
0
8300
2017-09-11
72
0
8300
2018-02-01
74
0
8300
2018-04-01
75
0
8300
2018-05-31
65
1
df.assign
、np.where
或列表理解可能很容易回答。
您可以groupby
客户端并在“WEIGHT_KG”列上使用pct_change
:
df['WEIGHT_LOSS'] = (df.groupby('CLIENT_ID')
['WEIGHT_KG']
.pct_change() # calculate percent change
.lt(-0.1) # loss if lower than -0.1 (-10%)
.astype(int) # convert True/False to 1/0
)
输出:
CLIENT_ID ENCOUNTER_DATE WEIGHT_KG WEIGHT_LOSS
0 8222 2020-01-01 56 0
1 8222 2020-03-02 58 0
2 8222 2020-04-18 50 1
3 8222 2020-07-31 54 0
4 8300 2017-06-10 71 0
5 8300 2017-09-11 72 0
6 8300 2018-02-01 74 0
7 8300 2018-04-01 75 0
8 8300 2018-05-31 65 1
我有一个数据框 df:
import pandas as pd
df = pd.DataFrame({"CLIENT_ID": [8222, 8222, 8222, 8222, 8300, 8300, 8300, 8300, 8300],
"ENCOUNTER_DATE": ['2020-01-01', '2020-03-02', '2020-04-18', '2020-07-31', '2017-06-10', '2017-09-11', '2018-02-01', '2018-04-01', '2018-05-31'],
"WEIGHT_KG": [56, 58, 50, 54, 71, 72, 74, 75, 65]})
按CLIENT_ID
和ENCOUNTER_DATE
CLIENT_ID | ENCOUNTER_DATE | WEIGHT_KG |
---|---|---|
8222 | 2020-01-01 | 56 |
8222 | 2020-03-02 | 58 |
8222 | 2020-04-18 | 50 |
8222 | 2020-07-31 | 54 |
8300 | 2017-06-10 | 71 |
8300 | 2017-09-11 | 72 |
8300 | 2018-02-01 | 74 |
8300 | 2018-04-01 | 75 |
8300 | 2018-05-31 | 65 |
我想创建一个 WEIGHT_LOSS
标志列,如果当前 WEIGHT_KG
比之前的测量值至少低 10%,则为 1,否则为 0,对于每个 CLIENT_ID
导致下面的 table:
CLIENT_ID | ENCOUNTER_DATE | WEIGHT_KG | WEIGHT_LOSS |
---|---|---|---|
8222 | 2020-01-01 | 56 | 0 |
8222 | 2020-03-02 | 58 | 0 |
8222 | 2020-04-18 | 50 | 1 |
8222 | 2020-07-31 | 54 | 0 |
8300 | 2017-06-10 | 71 | 0 |
8300 | 2017-09-11 | 72 | 0 |
8300 | 2018-02-01 | 74 | 0 |
8300 | 2018-04-01 | 75 | 0 |
8300 | 2018-05-31 | 65 | 1 |
df.assign
、np.where
或列表理解可能很容易回答。
您可以groupby
客户端并在“WEIGHT_KG”列上使用pct_change
:
df['WEIGHT_LOSS'] = (df.groupby('CLIENT_ID')
['WEIGHT_KG']
.pct_change() # calculate percent change
.lt(-0.1) # loss if lower than -0.1 (-10%)
.astype(int) # convert True/False to 1/0
)
输出:
CLIENT_ID ENCOUNTER_DATE WEIGHT_KG WEIGHT_LOSS
0 8222 2020-01-01 56 0
1 8222 2020-03-02 58 0
2 8222 2020-04-18 50 1
3 8222 2020-07-31 54 0
4 8300 2017-06-10 71 0
5 8300 2017-09-11 72 0
6 8300 2018-02-01 74 0
7 8300 2018-04-01 75 0
8 8300 2018-05-31 65 1