Pandas nunique() 仅用于分类列，否则为 null？

Question

我想检查数据框中分类列的唯一值的数量。 df.nunique() 为所有列提供唯一值，这需要很长时间。为了让它更快，我想跳过任何数字列。但是，我仍然希望输出是包含所有列的完整系列，只是数字列为 Null（并且不计算这些列）。

我一直在尝试 df._get_numeric_data()、集合和 df.unquniue()，但还没有得到我想要的输出。

所以输入

col_name type
col1    object
col2    object
col3    float64
col4    float64
col5    float64
col6    object
col7    float64
col8    object
col9    object

期望的输出：

col_name    nunqiue
col1    23
col2    3
col3    null
col4    null
col5    null
col6    4
col7    null
col8    6
col9    2

这里的关键是节省计算浮点数的唯一值的计算工作，并以一种流线型的熊猫式方式进行...

谢谢！

Answer 1

MCVE

df = pd.DataFrame(
       np.random.randint(1, 100, (100, 9)), columns=[f'col{i}' for i in range(1, 10)])

df[['col1', 'col2', 'col6', 'col8', 'col9']] = \
    df[['col1', 'col2', 'col6', 'col8', 'col9']].astype(object)

>>> df.dtypes
col1    object
col2    object
col3     int32
col4     int32
col5     int32
col6    object
col7     int32
col8    object
col9    object
dtype: object

您可以使用 select_dtypes 的 exclude 参数从计算中排除所有数字列。

df.select_dtypes(exclude='number').nunique().reindex(df.columns)

col1    62.0
col2    63.0
col3     NaN
col4     NaN
col5     NaN
col6    63.0
col7     NaN
col8    65.0
col9    61.0
dtype: float64

您可以 fiddle 使用 include 和 exclude 参数来 select_dtypes 以精确匹配您想要包含的列。

Pandas nunique() 仅用于分类列，否则为 null？

Pandas nunique() for categorical columns only, null otherwise?

python

pandas

categorical-data