pandas DF 中以百分比和计数值作为标签的马赛克图
mosaic plot with percentage and count values as labels in pandas DF
我有这样的 pandas 数据框:
LEVEL_1 LEVEL_2 Freq Percentage
0 HIGH HIGH 8842 17.684
1 AVERAGE LOW 2802 5.604
2 LOW LOW 22198 44.396
3 AVERAGE AVERAGE 6804 13.608
4 LOW AVERAGE 2030 4.060
5 HIGH AVERAGE 3666 7.332
6 AVERAGE HIGH 2887 5.774
7 LOW HIGH 771 1.542
我可以得到 LEVEL_1 和 LEVEL_2 的图块:
from statsmodels.graphics.mosaicplot import mosaic
mosaic(df, ['LEVEL_1','LEVEL_2'])
enter image description here
我只想将 Freq 和 Percentage 放在每个马赛克图块的中心。
我该怎么做?
这是一个开始。请注意,我必须在 DataFrame 中添加一行零以进行标记。您可以通过 lambda
函数中的字符串格式化使标签更好看。您还需要重新排序 headers.
import pandas as pd
from statsmodels.graphics.mosaicplot import mosaic
import io
d = io.StringIO()
d.write(""" LEVEL_1 LEVEL_2 Freq Percentage\n
HIGH HIGH 8842 17.684\n
AVERAGE LOW 2802 5.604\n
LOW LOW 22198 44.396\n
AVERAGE AVERAGE 6804 13.608\n
LOW AVERAGE 2030 4.060\n
HIGH AVERAGE 3666 7.332\n
AVERAGE HIGH 2887 5.774\n
LOW HIGH 771 1.542""")
d.seek(0)
df = pd.read_csv(d, skipinitialspace=True, delim_whitespace=True)
df = df.append({'LEVEL_1': 'HIGH', 'LEVEL_2': 'LOW', 'Freq': 0, 'Percentage': 0}, ignore_index=True)
df = df.sort_values(['LEVEL_1', 'LEVEL_2'])
df = df.set_index(['LEVEL_1', 'LEVEL_2'])
print(df)
mosaic(df['Freq'], labelizer=lambda k: df.loc[k].values);
我有这样的 pandas 数据框:
LEVEL_1 LEVEL_2 Freq Percentage
0 HIGH HIGH 8842 17.684
1 AVERAGE LOW 2802 5.604
2 LOW LOW 22198 44.396
3 AVERAGE AVERAGE 6804 13.608
4 LOW AVERAGE 2030 4.060
5 HIGH AVERAGE 3666 7.332
6 AVERAGE HIGH 2887 5.774
7 LOW HIGH 771 1.542
我可以得到 LEVEL_1 和 LEVEL_2 的图块:
from statsmodels.graphics.mosaicplot import mosaic
mosaic(df, ['LEVEL_1','LEVEL_2'])
enter image description here
我只想将 Freq 和 Percentage 放在每个马赛克图块的中心。
我该怎么做?
这是一个开始。请注意,我必须在 DataFrame 中添加一行零以进行标记。您可以通过 lambda
函数中的字符串格式化使标签更好看。您还需要重新排序 headers.
import pandas as pd
from statsmodels.graphics.mosaicplot import mosaic
import io
d = io.StringIO()
d.write(""" LEVEL_1 LEVEL_2 Freq Percentage\n
HIGH HIGH 8842 17.684\n
AVERAGE LOW 2802 5.604\n
LOW LOW 22198 44.396\n
AVERAGE AVERAGE 6804 13.608\n
LOW AVERAGE 2030 4.060\n
HIGH AVERAGE 3666 7.332\n
AVERAGE HIGH 2887 5.774\n
LOW HIGH 771 1.542""")
d.seek(0)
df = pd.read_csv(d, skipinitialspace=True, delim_whitespace=True)
df = df.append({'LEVEL_1': 'HIGH', 'LEVEL_2': 'LOW', 'Freq': 0, 'Percentage': 0}, ignore_index=True)
df = df.sort_values(['LEVEL_1', 'LEVEL_2'])
df = df.set_index(['LEVEL_1', 'LEVEL_2'])
print(df)
mosaic(df['Freq'], labelizer=lambda k: df.loc[k].values);