dtype 'Int32' 的 'fillna()' euiqvalent 是什么？

Question

简短问题：如何将所有 <1 或 <NA> 的值设置为 1？

长问题：假设我有一个纯整数 (int32!) pandas 列，我以前可以这样做来限制最小值：

>>> shots = pd.DataFrame([2, 0, 1], index=['foo', 'bar', 'baz'], columns={'shots'}, dtype='int32')
shots
     shots
foo      2
bar      0
baz      1

>>> max(shots.loc['foo', 'shots'], 1)
2

>>> max(shots.loc['bar', 'shots'], 1)
1

到目前为止，还不错。现在，假设列 shots 的 dtype 从 'int32' 更改为 Int32，允许 <NA>。这让我在访问 <NA> 记录时遇到麻烦。我收到此错误：

>>> shots = pd.DataFrame([2, np.nan, 1], index=['foo', 'bar', 'baz'], columns={'shots'}, dtype='Int32')
     shots
foo      2
bar   <NA>
baz      1

>>> max(shots.loc['bar', 'shots'], 1)    
`TypeError: boolean value of NA is ambiguous`

我该怎么办？

我的第一直觉是“好吧，让我们填充值，然后应用 max()”。但这也失败了：

>>> shots.loc[idx, 'shots'].fillna(1)

AttributeError: 'NAType' object has no attribute 'fillna'

--> 将条件应用于 <NA> 值的最 pandiastic/pydantic 方法是什么，即将所有 <NA> 设置为 1，或应用其他形式的基本匹配, 例如 max(<NA>, 1)?

版本

Python 3.8.6
Pandas 1.2.3
麻木 1.19.2

Answer 1

idx 应该是一个集合，否则如果它是一个标量，您会得到一个标量值：

# idx = 'bar'

>>> shots.loc[idx, 'shots']
<NA>

>>> shots.loc[idx, 'shots'].fillna(1)
...
AttributeError: 'NAType' object has no attribute 'fillna'

>>> shots.loc[[idx], 'shots'].fillna(1)
bar    1
Name: shots, dtype: Int32

问题是idx是怎么定义的？

旧答案

我无法重现你的问题。

shots = pd.DataFrame({'shots': [2, 1, pd.NA]}, dtype=pd.Int32Dtype())
idx = [2]

>>> shots
   shots
0      2
1      1
2   <NA>

>>> shots.dtypes
shots    Int32
dtype: object

>>> shots.loc[idx, 'shots'].fillna(1)
2    1
Name: shots, dtype: Int32

版本：

Python 3.9.7
Pandas 1.4.1
麻木 1.21.5

dtype 'Int32' 的 'fillna()' euiqvalent 是什么？

What is the 'fillna()' euiqvalent for dtype 'Int32'?

python

nullable

pandas

fillna