Pandas :如何为每个子组应用函数
Pandas : how to apply functions per subgroups
我有一个简单的数据框,其中包含国籍、职业和年龄列。
国籍热编码为 0、1、2,代表欧盟、美国、亚洲。
对于每个职业,我想找到每个国籍的百分比
例如:67% 的医生是欧洲人,33% 是亚洲人。
import pandas as pd
import numpy as np
#create dataframe
df=pd.DataFrame(np.concatenate((np.random.randint(low=0, high=3, size= (10,1)),np.random.randint(low=24, high=70, size=(10,1))),axis=1))
df.columns=['nationality','age']
df['occupation']=['teacher']*2+['engineer']*3+['doctor']*3+['lawyer']*2
nationality age occupation
0 0 65 teacher
1 0 31 teacher
2 0 30 engineer
3 2 63 engineer
4 0 28 engineer
5 1 27 doctor
6 0 52 doctor
7 0 60 doctor
8 0 33 lawyer
9 0 38 lawyer
df.groupby(['occupation','nationality']).count()
def iseuropean(x):
if x==0:
return 1
else:
return 0
def isamerican(x):
if x==1:
return 1
else:
return 0
def isasian(x):
if x==2:
return 1
else:
return 0
使用 groupby 我可以获得计数,但我想为每个职业组应用一个函数来确定百分比。不过我还没弄明白。
如有任何帮助,我们将不胜感激。
我假设您正在寻找每个职业的国籍百分比:
In [11]: c = df.groupby(['occupation','nationality'])["age"].count().rename("count")
In [12]: c
Out[12]:
occupation nationality
doctor 0 2
1 1
engineer 0 2
2 1
lawyer 0 2
teacher 0 2
Name: count, dtype: int64
In [13]: c / c.sum() # proportion of each, maybe not very useful
Out[13]:
occupation nationality
doctor 0 0.2
1 0.1
engineer 0 0.2
2 0.1
lawyer 0 0.2
teacher 0 0.2
Name: count, dtype: float64
In [14]: c / c.groupby(level=0).sum() # proportion of each occupation
Out[14]:
occupation nationality
doctor 0 0.666667
1 0.333333
engineer 0 0.666667
2 0.333333
lawyer 0 1.000000
teacher 0 1.000000
Name: count, dtype: float64
此外,您可能想使用分类代码而不是 is_XXX:
In [21]: pd.Categorical.from_codes(df.nationality, ["european", "american", "asian"])
Out[21]:
[european, european, european, asian, european, american, european, european, european, european]
Categories (3, object): [european, american, asian]
In [22]: df.nationality = pd.Categorical.from_codes(df.nationality, ["european", "american", "asian"])
In [23]: df
Out[23]:
nationality age occupation
0 european 65 teacher
1 european 31 teacher
2 european 30 engineer
3 asian 63 engineer
4 european 28 engineer
5 american 27 doctor
6 european 52 doctor
7 european 60 doctor
8 european 33 lawyer
9 european 38 lawyer
我有一个简单的数据框,其中包含国籍、职业和年龄列。 国籍热编码为 0、1、2,代表欧盟、美国、亚洲。
对于每个职业,我想找到每个国籍的百分比 例如:67% 的医生是欧洲人,33% 是亚洲人。
import pandas as pd
import numpy as np
#create dataframe
df=pd.DataFrame(np.concatenate((np.random.randint(low=0, high=3, size= (10,1)),np.random.randint(low=24, high=70, size=(10,1))),axis=1))
df.columns=['nationality','age']
df['occupation']=['teacher']*2+['engineer']*3+['doctor']*3+['lawyer']*2
nationality age occupation
0 0 65 teacher
1 0 31 teacher
2 0 30 engineer
3 2 63 engineer
4 0 28 engineer
5 1 27 doctor
6 0 52 doctor
7 0 60 doctor
8 0 33 lawyer
9 0 38 lawyer
df.groupby(['occupation','nationality']).count()
def iseuropean(x):
if x==0:
return 1
else:
return 0
def isamerican(x):
if x==1:
return 1
else:
return 0
def isasian(x):
if x==2:
return 1
else:
return 0
使用 groupby 我可以获得计数,但我想为每个职业组应用一个函数来确定百分比。不过我还没弄明白。
如有任何帮助,我们将不胜感激。
我假设您正在寻找每个职业的国籍百分比:
In [11]: c = df.groupby(['occupation','nationality'])["age"].count().rename("count")
In [12]: c
Out[12]:
occupation nationality
doctor 0 2
1 1
engineer 0 2
2 1
lawyer 0 2
teacher 0 2
Name: count, dtype: int64
In [13]: c / c.sum() # proportion of each, maybe not very useful
Out[13]:
occupation nationality
doctor 0 0.2
1 0.1
engineer 0 0.2
2 0.1
lawyer 0 0.2
teacher 0 0.2
Name: count, dtype: float64
In [14]: c / c.groupby(level=0).sum() # proportion of each occupation
Out[14]:
occupation nationality
doctor 0 0.666667
1 0.333333
engineer 0 0.666667
2 0.333333
lawyer 0 1.000000
teacher 0 1.000000
Name: count, dtype: float64
此外,您可能想使用分类代码而不是 is_XXX:
In [21]: pd.Categorical.from_codes(df.nationality, ["european", "american", "asian"])
Out[21]:
[european, european, european, asian, european, american, european, european, european, european]
Categories (3, object): [european, american, asian]
In [22]: df.nationality = pd.Categorical.from_codes(df.nationality, ["european", "american", "asian"])
In [23]: df
Out[23]:
nationality age occupation
0 european 65 teacher
1 european 31 teacher
2 european 30 engineer
3 asian 63 engineer
4 european 28 engineer
5 american 27 doctor
6 european 52 doctor
7 european 60 doctor
8 european 33 lawyer
9 european 38 lawyer