未获取二进制列的统计分析 pandas

Question

我有一个数据框，11 列 18k 行。最后一列是 1 或 0，但是当我使用 .describe() 时，我得到的是

count     19020
unique        2
top           1
freq      12332
Name: Class, dtype: int64

与平均值、标准差等的实际统计分析相对

有办法吗？

Answer 1

你可以使用

# percentile list 
perc =[.20, .40, .60, .80] 
  
# list of dtypes to include 
include =['object', 'float', 'int']

data.describe(percentiles = perc, include = include)

其中 data 是您的数据框（重点）。

由于您是堆栈的新手，我可能建议您包含一些实际代码（即显示您如何使用您的方法以及在什么情况下使用您的方法的代码）。你会得到更好的答案

Answer 2

如果 .describe() 未自动选取您的数字 (0, 1) 列，可能是因为它实际上并未编码为 int dtype。你可以在 .describe() 方法的 documentation 中看到这一点，它告诉你默认的 include 参数仅适用于数字类型：

None (default) : The result will include all numeric columns.

我的建议如下：


df.dtypes # check datatypes
df['num'] = df['num'].astype(int) # if it's not integer, cast it as such

df.describe(include=['object', 'int64']) # explicitly state the data types you'd like to describe

也就是说，首先检查数据类型（我假设列名为 num 和数据帧 df，但可以随意替换为正确的）。如果此指示符/(0,1) 列确实未编码为 int/整数类型，则使用 .astype(int) 将其转换为这样。然后，您可以自由使用 df.describe()，甚至可以指定要包含在描述输出中的数据类型的列，以获得更多 fine-grained 控制。

未获取二进制列的统计分析 pandas

Not getting stats analysis of binary column pandas

python

describe

dataframe

pandas