在 python 中有效地从现有变量创建新变量
Create a new variable from existing variables efficiently in python
我正在尝试重新编码变量。我已经能够用地图做到这一点,但是,我试图找出一种有效的方法来将重新编码的多个值(a、b、c)转换为单个值。在下面的示例中,我对 Asian
进行了三种不同的分类,并想相应地重新编码它们。我尝试使用布尔值,但出现以下错误。
df['Race'] = df['Race'].map({
'Black or African American' : 'Black',
'White' : 'White',
'Hispanic or Latino': 'Non-White Hispanic',
('Asian' | 'Asian/Indian/Pacific Islander' | 'Native Hawaiian or Other Pacific Islander') : 'Asian/Pacific Islander',
('American Indian or Alaska Native' | 'Other/Mixed') : 'Multiracial/other',
'Unspecified' : np.nan
})
TypeError: unsupported operand type(s) for |: 'str' and 'str'
是否有更简单但仍然有效的方法将多个变量重新编码为单个值?不一定是地图,那是我最熟悉的。
如何使用字典理解和解包:
df['Race'] = df['Race'].map({
'Black or African American' : 'Black',
'White' : 'White',
'Hispanic or Latino': 'Non-White Hispanic',
**{i: 'Asian/Pacific Islander' for i in ('Asian', 'Asian/Indian/Pacific Islander', 'Native Hawaiian or Other Pacific Islander')},
**{i: 'Multiracial/other' for i in ('American Indian or Alaska Native', 'Other/Mixed')},
'Unspecified' : np.nan
})
事实上,这样做就可以了:
df['Race'] = df['Race'].map({
'Black or African American' : 'Black',
'White' : 'White',
'Hispanic or Latino': 'Non-White Hispanic',
'Asian': 'Asian/Pacific Islander',
'Asian/Indian/Pacific Islander': 'Asian/Pacific Islander',
'Native Hawaiian or Other Pacific Islander': 'Asian/Pacific Islander',
'American Indian or Alaska Native': 'Multiracial/other',
'Other/Mixed': 'Multiracial/other',
'Unspecified' : np.nan
})
df['Race'] = df['Race'].map({
'Black or African American' : 'Black',
'White' : 'White',
'Hispanic or Latino': 'Non-White Hispanic',
'Unspecified' : np.nan,
**dict.fromkeys(['Asian', 'Asian/Indian/Pacific Islander', 'Native Hawaiian or Other Pacific Islander'], 'Asian/Pacific Islander'),
**dict.fromkeys(['American Indian or Alaska Native', 'Other/Mixed'], 'Multiracial/other'),
})
使用 apply 提高可读性。
race=[
'Black or African American',
'White',
'Hispanic or Latino',
'Asian',
'Asian/Indian/Pacific Islander',
'Native Hawaiian or Other Pacific Islander',
'American Indian or Alaska Native',
'Other/Mixed',
'Unspecified'
]
df=pd.DataFrame({'Race':race})
def lookup(x):
dictLookup={
'Black or African American' : 'Black',
'White' : 'White',
'Hispanic or Latino': 'Non-White Hispanic',
'Unspecified' : np.nan,
**{i:'Asian/Pacific Islander' for i in('Asian', 'Asian/Indian/Pacific Islander' , 'Native Hawaiian or Other Pacific Islander')},
**{i:'Multiracial/other' for i in('American Indian or Alaska Native','Alaska Native', 'Other/Mixed')}
}
return dictLookup[x]
df['Race']=df['Race'].apply(lambda x: lookup(x))
print(df.head(20))
我正在尝试重新编码变量。我已经能够用地图做到这一点,但是,我试图找出一种有效的方法来将重新编码的多个值(a、b、c)转换为单个值。在下面的示例中,我对 Asian
进行了三种不同的分类,并想相应地重新编码它们。我尝试使用布尔值,但出现以下错误。
df['Race'] = df['Race'].map({
'Black or African American' : 'Black',
'White' : 'White',
'Hispanic or Latino': 'Non-White Hispanic',
('Asian' | 'Asian/Indian/Pacific Islander' | 'Native Hawaiian or Other Pacific Islander') : 'Asian/Pacific Islander',
('American Indian or Alaska Native' | 'Other/Mixed') : 'Multiracial/other',
'Unspecified' : np.nan
})
TypeError: unsupported operand type(s) for |: 'str' and 'str'
是否有更简单但仍然有效的方法将多个变量重新编码为单个值?不一定是地图,那是我最熟悉的。
如何使用字典理解和解包:
df['Race'] = df['Race'].map({
'Black or African American' : 'Black',
'White' : 'White',
'Hispanic or Latino': 'Non-White Hispanic',
**{i: 'Asian/Pacific Islander' for i in ('Asian', 'Asian/Indian/Pacific Islander', 'Native Hawaiian or Other Pacific Islander')},
**{i: 'Multiracial/other' for i in ('American Indian or Alaska Native', 'Other/Mixed')},
'Unspecified' : np.nan
})
事实上,这样做就可以了:
df['Race'] = df['Race'].map({
'Black or African American' : 'Black',
'White' : 'White',
'Hispanic or Latino': 'Non-White Hispanic',
'Asian': 'Asian/Pacific Islander',
'Asian/Indian/Pacific Islander': 'Asian/Pacific Islander',
'Native Hawaiian or Other Pacific Islander': 'Asian/Pacific Islander',
'American Indian or Alaska Native': 'Multiracial/other',
'Other/Mixed': 'Multiracial/other',
'Unspecified' : np.nan
})
df['Race'] = df['Race'].map({
'Black or African American' : 'Black',
'White' : 'White',
'Hispanic or Latino': 'Non-White Hispanic',
'Unspecified' : np.nan,
**dict.fromkeys(['Asian', 'Asian/Indian/Pacific Islander', 'Native Hawaiian or Other Pacific Islander'], 'Asian/Pacific Islander'),
**dict.fromkeys(['American Indian or Alaska Native', 'Other/Mixed'], 'Multiracial/other'),
})
使用 apply 提高可读性。
race=[
'Black or African American',
'White',
'Hispanic or Latino',
'Asian',
'Asian/Indian/Pacific Islander',
'Native Hawaiian or Other Pacific Islander',
'American Indian or Alaska Native',
'Other/Mixed',
'Unspecified'
]
df=pd.DataFrame({'Race':race})
def lookup(x):
dictLookup={
'Black or African American' : 'Black',
'White' : 'White',
'Hispanic or Latino': 'Non-White Hispanic',
'Unspecified' : np.nan,
**{i:'Asian/Pacific Islander' for i in('Asian', 'Asian/Indian/Pacific Islander' , 'Native Hawaiian or Other Pacific Islander')},
**{i:'Multiracial/other' for i in('American Indian or Alaska Native','Alaska Native', 'Other/Mixed')}
}
return dictLookup[x]
df['Race']=df['Race'].apply(lambda x: lookup(x))
print(df.head(20))