筛选 pandas 系列然后设置所选部分的值的最佳方法是什么？

Question

例如我有这样一个系列：

AFalse    0.220522 
BTrue    -1.050370
CFalse   -1.202922
DTrue     0.950305
EFalse    0.003110
FTrue     1.115483
GFalse    0.767281
HTrue    -1.376692
IFalse    1.729867
JTrue     2.574027
dtype: float64

我只想过滤掉 'True' 的行，并将值设置为 None。从速度的角度来看，最好的方法是什么？我会运行这个操作几百万次。谢谢

Answer 1

使用向量化运算将有助于提高速度。以你的 Series 作为 s

df = s.reset_index()
mask = df['index'].str.contains('True')
df.loc[mask, 'a'] = None
df.set_index('index')['a']

returns

index
AFalse    0.220522
BTrue          NaN
CFalse   -1.202922
DTrue          NaN
EFalse    0.003110
FTrue          NaN
GFalse    0.767281
HTrue          NaN
IFalse    1.729867
JTrue          NaN
Name: a, dtype: float64

Answer 2

初始化一个随机序列：

s = pd.Series(np.random.rand(4), index=idx, name='test')

使用列表理解在索引上创建掩码。请注意，'True' 可以是索引值内的任何位置。然后使用 .loc 设置值，就像您已经完成的那样。

idx = ['True' in i for i in s.index]
s.loc[idx] = None  # or np.NaN

>>> s
Out[50]: 
ATrue          None
BFalse    0.9134165
CTrue          None
DFalse    0.2530273
Name: test, dtype: object

筛选 pandas 系列然后设置所选部分的值的最佳方法是什么？

What's the best way to filter a pandas series and then set the values of the part selected?

select

filtering

series

setvalue

pandas