尝试在 python 中复制描述性统计分析工具 excel / 添加模式到 describe() 函数
Trying to replicate descriptive statistics analysis tools excel in python / add mode to describe() function
我试图通过聚合 pandas 库中可用的一些描述性统计信息来使用 python(jupyter notebook)复制 excel 中的描述性统计信息(汇总统计信息)分析工具,但每次我在代码中添加模式功能,它总是 return :
ValueError: cannot combine transform and aggregation operations
我的代码是:
df2 = df[["pm10","so2", "co", "o3", "no2" ]]
df2.agg(
{
"pm10": ["mean", "sem", "median", "std", "var", "kurt", "skew", "min", "max", "sum", "count", "mode"],
"so2": ["mean", "sem", "median", "std", "var", "kurt", "skew", "min", "max", "sum", "count", "mode"],
"co": ["mean", "sem", "median", "std", "var", "kurt", "skew", "min", "max", "sum", "count", "mode"],
"o3": ["mean", "sem", "median", "std", "var", "kurt", "skew", "min", "max", "sum", "count", "mode"],
"no2": ["mean", "sem", "median", "std", "var", "kurt", "skew", "min", "max", "sum", "count", "mode"]
}
)
只有return包含模式功能时出错,其他功能正常。这是我的 dataset
我想要的结果:
尝试使用统计包中的模式函数:
from statistics import mode
func_list = ["mean", "sem", "median", "std", "var", "kurt", "skew", "min", "max", "sum", "count", mode]
df2.agg(
{
"pm10": func_list,
"so2": func_list,
"co": func_list,
"o3": func_list,
"no2": func_list
})
这不是最简洁的方法,但它确实有效,我还添加了一些其他度量,如 nans 值和范围
df2 = df[["pm10","so2", "co", "o3", "no2" ]]
def describe(df2, stats):
d = df2.describe()
return d.append(df2.reindex(d.columns, axis = 1).agg(stats))
df2_desc = describe(df2, ["median", "var", "sem", "kurt", "skew", "sum",])
count_nan = df2.isnull().sum(axis=0)
df2_append = df2_desc.append(pd.Series(count_nan, name='nans'))
df_mode = df2.mode(axis=0, numeric_only=True, dropna=True)
df2_concat = pd.concat([df2_append, df_mode])
df2_concat.loc['range'] = df2_concat.loc['max'] - df2_concat.loc['min']
df2_concat
我试图通过聚合 pandas 库中可用的一些描述性统计信息来使用 python(jupyter notebook)复制 excel 中的描述性统计信息(汇总统计信息)分析工具,但每次我在代码中添加模式功能,它总是 return :
ValueError: cannot combine transform and aggregation operations
我的代码是:
df2 = df[["pm10","so2", "co", "o3", "no2" ]]
df2.agg(
{
"pm10": ["mean", "sem", "median", "std", "var", "kurt", "skew", "min", "max", "sum", "count", "mode"],
"so2": ["mean", "sem", "median", "std", "var", "kurt", "skew", "min", "max", "sum", "count", "mode"],
"co": ["mean", "sem", "median", "std", "var", "kurt", "skew", "min", "max", "sum", "count", "mode"],
"o3": ["mean", "sem", "median", "std", "var", "kurt", "skew", "min", "max", "sum", "count", "mode"],
"no2": ["mean", "sem", "median", "std", "var", "kurt", "skew", "min", "max", "sum", "count", "mode"]
}
)
只有return包含模式功能时出错,其他功能正常。这是我的 dataset
我想要的结果:
尝试使用统计包中的模式函数:
from statistics import mode
func_list = ["mean", "sem", "median", "std", "var", "kurt", "skew", "min", "max", "sum", "count", mode]
df2.agg(
{
"pm10": func_list,
"so2": func_list,
"co": func_list,
"o3": func_list,
"no2": func_list
})
这不是最简洁的方法,但它确实有效,我还添加了一些其他度量,如 nans 值和范围
df2 = df[["pm10","so2", "co", "o3", "no2" ]]
def describe(df2, stats):
d = df2.describe()
return d.append(df2.reindex(d.columns, axis = 1).agg(stats))
df2_desc = describe(df2, ["median", "var", "sem", "kurt", "skew", "sum",])
count_nan = df2.isnull().sum(axis=0)
df2_append = df2_desc.append(pd.Series(count_nan, name='nans'))
df_mode = df2.mode(axis=0, numeric_only=True, dropna=True)
df2_concat = pd.concat([df2_append, df_mode])
df2_concat.loc['range'] = df2_concat.loc['max'] - df2_concat.loc['min']
df2_concat