python 数据框上的函数没有 return 预期结果
python function on dataframe did not return expected result
我编写了以下函数将变量转换为虚拟变量:
def convert_to_dummies(df, column):
dummies = pd.get_dummies(df[column])
df = pd.concat([df, dummies], axis=1)
df = df.drop(column, axis=1) #when dropping column don't forget "axis=1"
return df
但是当我将其应用于 df 中的分类变量时:
for col in ['col1', 'col2', ....]:
convert_to_dummies(df, col)
* 'col1', ''col2', ... are categorical columns in df.
我得到了原始 df,none 的分类变量被转换为虚拟变量。我做错了什么?
您需要重新分配输出:
for col in ['col1', 'col2', ....]:
df = convert_to_dummies(df, col)
样本:
df = pd.DataFrame({'col1':list('abcdef'),
'col2':list('abadec'),
'col3':list('aaadee'),
'col4':list('aabbcc')})
print (df)
col1 col2 col3 col4
0 a a a a
1 b b a a
2 c a a b
3 d d d b
4 e e e c
5 f c e c
for col in ['col1', 'col2']:
df = convert_to_dummies(df, col)
print (df)
col3 col4 a b c d e f a b c d e
0 a a 1 0 0 0 0 0 1 0 0 0 0
1 a a 0 1 0 0 0 0 0 1 0 0 0
2 a b 0 0 1 0 0 0 1 0 0 0 0
3 d b 0 0 0 1 0 0 0 0 0 1 0
4 e c 0 0 0 0 1 0 0 0 0 0 1
5 e c 0 0 0 0 0 1 0 0 1 0 0
如果需要唯一的分类列,最好是删除循环:
def convert_to_dummies_cols(df, cols):
#create all dummies once with all columns selected by subset
dummies = pd.get_dummies(df[cols], prefix='', prefix_sep='')
#aggregate max by columns
dummies = dummies.groupby(level=0, axis=1).max()
#add to original df
df = pd.concat([df, dummies], axis=1)
df = df.drop(cols, axis=1)
return df
#parameter is list of columns for dummies
df = convert_to_dummies_cols(df, ['col1', 'col2'])
print (df)
col3 col4 a b c d e f
0 a a 1 0 0 0 0 0
1 a a 0 1 0 0 0 0
2 a b 1 0 1 0 0 0
3 d b 0 0 0 1 0 0
4 e c 0 0 0 0 1 0
5 e c 0 0 1 0 0 1
我编写了以下函数将变量转换为虚拟变量:
def convert_to_dummies(df, column):
dummies = pd.get_dummies(df[column])
df = pd.concat([df, dummies], axis=1)
df = df.drop(column, axis=1) #when dropping column don't forget "axis=1"
return df
但是当我将其应用于 df 中的分类变量时:
for col in ['col1', 'col2', ....]:
convert_to_dummies(df, col)
* 'col1', ''col2', ... are categorical columns in df.
我得到了原始 df,none 的分类变量被转换为虚拟变量。我做错了什么?
您需要重新分配输出:
for col in ['col1', 'col2', ....]:
df = convert_to_dummies(df, col)
样本:
df = pd.DataFrame({'col1':list('abcdef'),
'col2':list('abadec'),
'col3':list('aaadee'),
'col4':list('aabbcc')})
print (df)
col1 col2 col3 col4
0 a a a a
1 b b a a
2 c a a b
3 d d d b
4 e e e c
5 f c e c
for col in ['col1', 'col2']:
df = convert_to_dummies(df, col)
print (df)
col3 col4 a b c d e f a b c d e
0 a a 1 0 0 0 0 0 1 0 0 0 0
1 a a 0 1 0 0 0 0 0 1 0 0 0
2 a b 0 0 1 0 0 0 1 0 0 0 0
3 d b 0 0 0 1 0 0 0 0 0 1 0
4 e c 0 0 0 0 1 0 0 0 0 0 1
5 e c 0 0 0 0 0 1 0 0 1 0 0
如果需要唯一的分类列,最好是删除循环:
def convert_to_dummies_cols(df, cols):
#create all dummies once with all columns selected by subset
dummies = pd.get_dummies(df[cols], prefix='', prefix_sep='')
#aggregate max by columns
dummies = dummies.groupby(level=0, axis=1).max()
#add to original df
df = pd.concat([df, dummies], axis=1)
df = df.drop(cols, axis=1)
return df
#parameter is list of columns for dummies
df = convert_to_dummies_cols(df, ['col1', 'col2'])
print (df)
col3 col4 a b c d e f
0 a a 1 0 0 0 0 0
1 a a 0 1 0 0 0 0
2 a b 1 0 1 0 0 0
3 d b 0 0 0 1 0 0
4 e c 0 0 0 0 1 0
5 e c 0 0 1 0 0 1