使用zip函数根据行列信息生成单元格位置的方法

Question

我创建了以下数据框，并想识别为 Null 的单元格，

import pandas as pd
import numpy as np
data = [{'a': 1, 'b': 2, 'c':3},
        {'a':10, 'b': np.NaN, 'c':"" },
         {'a':10, 'b':"" , 'c':np.NaN }]
df = pd.DataFrame(data)

     a    b     c
0    1    2     3
1   10   NaN    
2   10         NaN

我使用了以下代码，x1 = np.where(pd.isnull(df)) 并得到了类似

的结果

print(x1)
(array([1, 2], dtype=int64), array([1, 2], dtype=int64))

但是，我想为与 NaN 关联的每个条目显式生成单元格位置。我使用 zip 功能，但得到以下错误信息

print(set(zip(x1)))



 ---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [41], in <cell line: 1>()
----> 1 print(set(zip(x1)))

TypeError: unhashable type: 'numpy.ndarray'

根据x1显式生成位置信息的正确方法是什么？

Answer 1

您可以使用 numpy.where:

import numpy as np
null_indices, col_idx = np.where(df.isna())
null_columns = df.columns[col_idx]

输出：

(array([1, 2], dtype=int64), Index(['b', 'c'], dtype='object'))

如果你想把它看成元组，你可以zip:

out = list(zip(null_indices, null_columns))

输出：

[(1, 'b'), (2, 'c')]

对于您的特定代码，由于 x1 是一个数组元组，您需要将它们解包到 zip 中，例如：

out = list(zip(*x1))

Answer 2

尝试 stack 然后获取 NaN 索引

df.stack(dropna=False).loc[lambda x : x!=x].index.tolist()
Out[115]: [(1, 'b'), (2, 'c')]

使用zip函数根据行列信息生成单元格位置的方法

the ways of using zip function to generate the cell location based on the row and column information

dataframe

python-3.x

pandas

python-zip