如何根据频率最高的顶部和底部字符串填充 NaN 值

How to fill NaN values based on the top and bottom strings with highest frequency

我有一个 dataframe 字符串值,其中有缺失值。满足以下条件需要populated/filled

我的DataFrame:

     reading
0       talk
1       kill
2        NaN
3   vertical
4       type
5       kill
6        NaN
7   vertical
8   vertical
9       type
10   durable
11       NaN
12   durable
13  vertical

预期输出:

     reading
0       talk
1       kill
2       kill
3   vertical
4       type
5       kill
6   vertical
7   vertical
8   vertical
9       type
10   durable
11  vertical
12   durable
13  vertical

这是最小可重现代码:

import pandas as pd
import numpy as np

df = pd.DataFrame({'reading':['talk','kill',np.NAN,'vertical','type','kill',np.NAN,'vertical','vertical','type','durable',np.NAN,'durable','vertical']}) 

def filldf(df):
    # Do the logic here
    return df

我不确定如何解决这个问题。任何帮助将不胜感激!!

如果您没有太多的 NaN 值,您可以遍历 NaN“读取”值的索引,并简单地查找它周围 6 个值的 mode(使用 iloc 以获得多个模式的第一次出现)并将值分配回相应的“NaN”值

msk = df['reading'].isna()
df.loc[msk, 'reading'] = [df.loc[min(0, i-3):i+3, 'reading'].mode().iloc[0] for i in  df.index[msk]]

输出:

     reading
0       talk
1       kill
2       kill
3   vertical
4       type
5       kill
6   vertical
7   vertical
8   vertical
9       type
10   durable
11  vertical
12   durable
13  vertical