在交叉表中使用 bin 功能
Using bin feature in crosstabulation
labs = ['small','medium','big','large']
df['size'] = pd.qcut(df.volume,4,labels=labs)
pd.crosstab(df.size,df.cut,margins=True,normalize='columns')
#cut and volume are columns/features of df Dataframe
以上是我尝试执行的片段。这是我得到的输出
cut Fair Good Ideal Premium Very Good All
row_0
539430 1.0 1.0 1.0 1.0 1.0 1.0
但我希望 ['small','medium','big','large'] 作为索引。
我怎样才能将它们作为索引?
我还尝试将 df.size 的类型从类别更改为字符串。不起作用
我认为您需要交换列,如果列名类似于 pandas 中的方法,如 DataFrame.size
:
,则最好使用 []
而不是点符号
df = pd.DataFrame({'cut':['Fair', 'Good'] * 3, 'volume':[1, 5, 10, 29, 30, 2]})
labs = ['small','medium','big','large']
df['size'] = pd.qcut(df.volume,4,labels=labs)
#there is 18 values in df
print (df.size)
18
df1 = pd.crosstab(df.size,df.cut,margins=True,normalize='columns')
print (df1)
cut Fair Good All
row_0
18 1.0 1.0 1.0
df2 = pd.crosstab(df['cut'], df['size'],margins=True,normalize='columns')
print (df2)
size small medium big large All
cut
Fair 0.5 0.0 1.0 0.5 0.5
Good 0.5 1.0 0.0 0.5 0.5
labs = ['small','medium','big','large']
df['size'] = pd.qcut(df.volume,4,labels=labs)
pd.crosstab(df.size,df.cut,margins=True,normalize='columns')
#cut and volume are columns/features of df Dataframe
以上是我尝试执行的片段。这是我得到的输出
cut Fair Good Ideal Premium Very Good All
row_0
539430 1.0 1.0 1.0 1.0 1.0 1.0
但我希望 ['small','medium','big','large'] 作为索引。
我怎样才能将它们作为索引?
我还尝试将 df.size 的类型从类别更改为字符串。不起作用
我认为您需要交换列,如果列名类似于 pandas 中的方法,如 DataFrame.size
:
[]
而不是点符号
df = pd.DataFrame({'cut':['Fair', 'Good'] * 3, 'volume':[1, 5, 10, 29, 30, 2]})
labs = ['small','medium','big','large']
df['size'] = pd.qcut(df.volume,4,labels=labs)
#there is 18 values in df
print (df.size)
18
df1 = pd.crosstab(df.size,df.cut,margins=True,normalize='columns')
print (df1)
cut Fair Good All
row_0
18 1.0 1.0 1.0
df2 = pd.crosstab(df['cut'], df['size'],margins=True,normalize='columns')
print (df2)
size small medium big large All
cut
Fair 0.5 0.0 1.0 0.5 0.5
Good 0.5 1.0 0.0 0.5 0.5