使用分类列作为条件的特征工程薪资数据

Feature Engineering Salary Data using Categorical Column as a condition

考虑到分类列,需要将工资金额转换为年化工资:

df = pd.DataFrame({'Name':['A','B','C','D','E'],
                  'sal_amt':[4500,50000,2000,3000,5000],
                  'sal_md':['M','Y','W','B','M']})
df.head()

#defined a function for my problem...

def func(row):
    if row['sal_md'] == 'M':
        return (row['sal_amt']*12)
    elif row['sal_md'] =='Y':
        return row['sal_amt'] 
    elif row['sal_md'] == 'H':
        return (row['sal_amt']*8760)
    elif row['sal_md'] == 'W':
        return (row['sal_amt']*52)
    elif row['sal_md'] == 'B':
        return (row['sal_amt']*26)
    elif row['sal_md'] == 'S':
        return row['sal_amt']
    elif row['sal_md'] == 'A':
        return row['sal_amt']


df['sal_annual'] = df.apply(func,axis=1)

https://i.stack.imgur.com/INXva.png

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'Name':['A','B','C','D','E'],
                      'sal_amt':[4500,50000,2000,3000,5000],
                      'sal_md':['M','Y','W','B','M']})

In [3]: multiplier_dict = {'M':12, 'Y':1, 'W':52, 'B':26}

In [4]: df['sal_multiplier'] = df.sal_md.map(multiplier_dict)

In [5]: df['sal_annual'] = df.sal_amt*df.sal_multiplier

In [6]: df.head()
Out[6]:
  Name  sal_amt sal_md  sal_multiplier  sal_annual
0    A     4500      M              12       54000
1    B    50000      Y               1       50000
2    C     2000      W              52      104000
3    D     3000      B              26       78000
4    E     5000      M              12       60000

不完全是你问的,但以一种简单的 pythonic 方式准确地解决了你的问题。