Groupby / sort_values 在看医疗数据

Groupby / sort_values in looking at medical data

Hi - I'm a veterinarian and I am trying to look at some medical data in a dataframe. The df consists of 100k rows(!) , and amongst them some columns named 'ClinicName', 'Induction Agent', and 'Complication Present' (1 = True and 0 = false). example below:

10 row example

ClinicName Induction Agent Complication Present
Redhill Propofol 1
Christchurch Alfaxan 0
Redhill Propofol 1
Worcester Propofol 0
Christchurch Alfaxan 0
Derby Propofol 0
Worcester Alfaxan 1
Derby Propofol 0
Redhill Propofol 1

I want to create a normalised horizontal bar graph showing whether a complication was present or not for each ClinicName and sub grouped by the type of induction agent. This I have successfully done this in the form :

complication_by_clinic = df.groupby(['ClinicName', 'Induction Agent'])['Complication Present'].
value_counts(normalize=False, sort=True, ascending=True,bins=None, dropna=True).unstack().tail(10)

complication_by_clinic.plot(kind='barh', stacked=True, figsize=[20,5], colormap='winter')

However what I really need is to sort_values so that the normalised values are ordered either ascending or descending, and that the induction agents in the bar graph are coloured differently from each other. Then I want to be able to remove all clinics data with an normalised value of less than a certain amount (say 0.1) by using df.drop.

(To give some background the reason is that on Chi squared analysis at the moment the values for induction agent and Complication Present of 0 are significantly skewing the data since some clinics are not entering data regularly)

Something like this for sorting the values is needed but I can't get it right:

complication_by_clinic = df.sort_values(df.groupby(['ClinicName', 'Induction Agent'])['Complication Present'].sum()

但我还坚持以不同方式为条形图中的 'Induction Agent' 着色。非常感谢任何帮助 - 在 return 中随时问我关于你的宠物的问题!

此 prtscr link 显示了它目前的显示方式: [1]: https://i.stack.imgur.com/wZB8F.png 这就是我想要的样子: https://1drv.ms/u/s!Ajl7cdyxWsko6Qu6lZZDEVcHgDaa?e=3sShAK

[这里有一些可能有用的附加打印屏幕

https://1drv.ms/w/s!Ajl7cdyxWsko6QxSYdylu-3CoC6H?e=hR1BfS]

第一部分:

complication_by_clinic.sort_values(['ClinicName', 'Induction Agent'], ascending=True).plot(kind='barh', stacked=True, figsize=[20,5], colormap='winter')

编辑:

抱歉应该是:

complication_by_clinic['sum'] = complication_by_clinic.sum(1)
complication_by_clinic.sort_values(by='sum', ascending=True).drop('sum', axis=1).plot(kind='barh', stacked=True, figsize=[20,5], colormap='winter')