Pandas 按 min() 条件分组
Pandas conditional group by min()
我试图在本金余额低于支付金额的 5% 时获取日期变量的最小值。我希望这个按帐号提取,但我不想要一个按帐号分组的新df。
我的 df 是这样的:
| account_number | period_date | principal_balance_amt | disbursement_amt |
| -------------: | ----------- | --------------------- | ---------------- |
| 1 | 2021-01-01 | 10 | 100 |
| 1 | 2021-02-01 | 6 | 100 |
| 1 | 2021-03-01 | 3 | 100 |
| 1 | 2021-04-01 | 0 | 100 |
| 2 | 2021-01-01 | 20 | 100 |
| 2 | 2021-02-01 | 15 | 100 |
| 2 | 2021-03-01 | 11 | 100 |
| 2 | 2021-04-01 | 8 | 100 |
我已经尝试过类似的代码来让它工作,但它只是 return 无效的语法。
df['churn_date'] = df.loc[groupby('account_number').(df['principal_balance_amt'] <= 0.05 * df['disbursement_amt']), 'period_date'].min()
我希望代码创建一个如下所示的 df:
account_number
period_date
principal_balance_amt
disbursement_amt
churn_date
1
2021-01-01
10
100
2021-03-01
1
2021-02-01
6
100
2021-03-01
1
2021-03-01
3
100
2021-03-01
1
2021-04-01
0
100
2021-03-01
2
2021-01-01
20
100
nan
2
2021-02-01
15
100
nan
2
2021-03-01
11
100
nan
2
2021-04-01
8
100
nan
对新列使用 Series.where
for replace period_date
to NaN
if no match and then use GroupBy.transform
和 min
:
mask = (df['principal_balance_amt'] <= 0.05 * df['disbursement_amt'])
df['churn_date'] = (df.assign(new = df['period_date'].where(mask))
.groupby('account_number')['new']
.transform('min'))
print (df)
account_number period_date principal_balance_amt disbursement_amt \
0 1 2021-01-01 10 100
1 1 2021-02-01 6 100
2 1 2021-03-01 3 100
3 1 2021-04-01 0 100
4 2 2021-01-01 20 100
5 2 2021-02-01 15 100
6 2 2021-03-01 11 100
7 2 2021-04-01 8 100
churn_date
0 2021-03-01
1 2021-03-01
2 2021-03-01
3 2021-03-01
4 NaT
5 NaT
6 NaT
7 NaT
通过 Series.map
only filtered rows by boolean indexing
与聚合 min
进行映射的替代解决方案:
mask = (df['principal_balance_amt'] <= 0.05 * df['disbursement_amt'])
s = df[mask].groupby('account_number')['period_date'].min()
df['churn_date'] = df['account_number'].map(s)
我试图在本金余额低于支付金额的 5% 时获取日期变量的最小值。我希望这个按帐号提取,但我不想要一个按帐号分组的新df。
我的 df 是这样的:
| account_number | period_date | principal_balance_amt | disbursement_amt |
| -------------: | ----------- | --------------------- | ---------------- |
| 1 | 2021-01-01 | 10 | 100 |
| 1 | 2021-02-01 | 6 | 100 |
| 1 | 2021-03-01 | 3 | 100 |
| 1 | 2021-04-01 | 0 | 100 |
| 2 | 2021-01-01 | 20 | 100 |
| 2 | 2021-02-01 | 15 | 100 |
| 2 | 2021-03-01 | 11 | 100 |
| 2 | 2021-04-01 | 8 | 100 |
我已经尝试过类似的代码来让它工作,但它只是 return 无效的语法。
df['churn_date'] = df.loc[groupby('account_number').(df['principal_balance_amt'] <= 0.05 * df['disbursement_amt']), 'period_date'].min()
我希望代码创建一个如下所示的 df:
account_number | period_date | principal_balance_amt | disbursement_amt | churn_date |
---|---|---|---|---|
1 | 2021-01-01 | 10 | 100 | 2021-03-01 |
1 | 2021-02-01 | 6 | 100 | 2021-03-01 |
1 | 2021-03-01 | 3 | 100 | 2021-03-01 |
1 | 2021-04-01 | 0 | 100 | 2021-03-01 |
2 | 2021-01-01 | 20 | 100 | nan |
2 | 2021-02-01 | 15 | 100 | nan |
2 | 2021-03-01 | 11 | 100 | nan |
2 | 2021-04-01 | 8 | 100 | nan |
对新列使用 Series.where
for replace period_date
to NaN
if no match and then use GroupBy.transform
和 min
:
mask = (df['principal_balance_amt'] <= 0.05 * df['disbursement_amt'])
df['churn_date'] = (df.assign(new = df['period_date'].where(mask))
.groupby('account_number')['new']
.transform('min'))
print (df)
account_number period_date principal_balance_amt disbursement_amt \
0 1 2021-01-01 10 100
1 1 2021-02-01 6 100
2 1 2021-03-01 3 100
3 1 2021-04-01 0 100
4 2 2021-01-01 20 100
5 2 2021-02-01 15 100
6 2 2021-03-01 11 100
7 2 2021-04-01 8 100
churn_date
0 2021-03-01
1 2021-03-01
2 2021-03-01
3 2021-03-01
4 NaT
5 NaT
6 NaT
7 NaT
通过 Series.map
only filtered rows by boolean indexing
与聚合 min
进行映射的替代解决方案:
mask = (df['principal_balance_amt'] <= 0.05 * df['disbursement_amt'])
s = df[mask].groupby('account_number')['period_date'].min()
df['churn_date'] = df['account_number'].map(s)