根据 DF 中更改的不同字符串列停止前向填充
Stopping a forward fill based on a different string column changing in the DF
我是 Python 的新手,所以放轻松!
我有一个如下所示的数据框。我想向前填充 shares_owned 列中的 NaN,但当 df['ticker'] 中的字符串更改时停止。并且仅当另一个数字再次出现在 shares_owned 中时才开始。
date
ticker
shares_owned
price
01/01/2020
EZY
NaN
£2
02/01/2020
EZY
10
£2.1
03/01/2020
EZY
NaN
£2.12
04/01/2020
EZY
NaN
£12.5
01/01/2020
FTSE
NaN
£11
02/01/2020
FTSE
NaN
£12
03/01/2020
FTSE
2
£12.5
04/01/2020
FTSE
NaN
£12.5
例如,输出 table 将如下所示:
date
ticker
shares_owned
price
01/01/2020
EZY
NaN
£2
02/01/2020
EZY
10
£2.1
03/01/2020
EZY
10
£2.12
04/01/2020
EZY
10
£12.5
01/01/2020
FTSE
NaN
£11
02/01/2020
FTSE
NaN
£12
03/01/2020
FTSE
2
£12.5
04/01/2020
FTSE
2
£12.5
到目前为止,我一直在尝试使用 .fillna(method='ffill') 无济于事。
- 你注意到分组,因此
groupby()
进行分组
- 组内
fillna(method="fill")
组内transform()
df = pd.read_csv(io.StringIO("""date ticker shares_owned price
01/01/2020 EZY NaN £2
02/01/2020 EZY 10 £2.1
03/01/2020 EZY NaN £2.12
04/01/2020 EZY NaN £12.5
01/01/2020 FTSE NaN £11
02/01/2020 FTSE NaN £12
03/01/2020 FTSE 2 £12.5
04/01/2020 FTSE NaN £12.5"""), sep="\t")
df["shares_owned"] = df.groupby("ticker")["shares_owned"].transform(lambda s: s.fillna(method="ffill"))
输出
date
ticker
shares_owned
price
0
01/01/2020
EZY
nan
£2
1
02/01/2020
EZY
10
£2.1
2
03/01/2020
EZY
10
£2.12
3
04/01/2020
EZY
10
£12.5
4
01/01/2020
FTSE
nan
£11
5
02/01/2020
FTSE
nan
£12
6
03/01/2020
FTSE
2
£12.5
7
04/01/2020
FTSE
2
£12.5
我是 Python 的新手,所以放轻松!
我有一个如下所示的数据框。我想向前填充 shares_owned 列中的 NaN,但当 df['ticker'] 中的字符串更改时停止。并且仅当另一个数字再次出现在 shares_owned 中时才开始。
date | ticker | shares_owned | price |
---|---|---|---|
01/01/2020 | EZY | NaN | £2 |
02/01/2020 | EZY | 10 | £2.1 |
03/01/2020 | EZY | NaN | £2.12 |
04/01/2020 | EZY | NaN | £12.5 |
01/01/2020 | FTSE | NaN | £11 |
02/01/2020 | FTSE | NaN | £12 |
03/01/2020 | FTSE | 2 | £12.5 |
04/01/2020 | FTSE | NaN | £12.5 |
例如,输出 table 将如下所示:
date | ticker | shares_owned | price |
---|---|---|---|
01/01/2020 | EZY | NaN | £2 |
02/01/2020 | EZY | 10 | £2.1 |
03/01/2020 | EZY | 10 | £2.12 |
04/01/2020 | EZY | 10 | £12.5 |
01/01/2020 | FTSE | NaN | £11 |
02/01/2020 | FTSE | NaN | £12 |
03/01/2020 | FTSE | 2 | £12.5 |
04/01/2020 | FTSE | 2 | £12.5 |
到目前为止,我一直在尝试使用 .fillna(method='ffill') 无济于事。
- 你注意到分组,因此
groupby()
进行分组 - 组内
fillna(method="fill")
组内transform()
df = pd.read_csv(io.StringIO("""date ticker shares_owned price
01/01/2020 EZY NaN £2
02/01/2020 EZY 10 £2.1
03/01/2020 EZY NaN £2.12
04/01/2020 EZY NaN £12.5
01/01/2020 FTSE NaN £11
02/01/2020 FTSE NaN £12
03/01/2020 FTSE 2 £12.5
04/01/2020 FTSE NaN £12.5"""), sep="\t")
df["shares_owned"] = df.groupby("ticker")["shares_owned"].transform(lambda s: s.fillna(method="ffill"))
输出
date | ticker | shares_owned | price | |
---|---|---|---|---|
0 | 01/01/2020 | EZY | nan | £2 |
1 | 02/01/2020 | EZY | 10 | £2.1 |
2 | 03/01/2020 | EZY | 10 | £2.12 |
3 | 04/01/2020 | EZY | 10 | £12.5 |
4 | 01/01/2020 | FTSE | nan | £11 |
5 | 02/01/2020 | FTSE | nan | £12 |
6 | 03/01/2020 | FTSE | 2 | £12.5 |
7 | 04/01/2020 | FTSE | 2 | £12.5 |