具有条件的 Pandas DF 中的系列算术 - 先前的操作被覆盖
Arithmetic on a Series in Pandas DF With Conditional - Prior Operation Gets Overwritten
我正在抓取一些工资数据,我需要根据另一列将其转换为小时费率或年费率。我已经研究了如何做到这一点——这可能不是最有效的——但它适用于一条线。
数据
import pandas as pd, numpy as np
columns = ['Location','Hourly','Annually','Monthly','Daily','Average','Hourly_Rate','Annual_Rate']
df = pd.DataFrame(columns=columns)
df.loc[1] = ['A',True,False,False,False,10.10,np.nan,np.nan]
df.loc[2] = ['B',False,True,False,False,50000,np.nan,np.nan]
df['Annual_Rate'] = (df['Average'] * 2080).where(df['Hourly'] == True) #need this line to run and not get overwritten
df['Annual_Rate'] = df['Average'].where(df['Annually'] == True ) #overwrites prior line
df['Annual_Rate'] = df['Average'].where(df['Annually'] == True & pd.isna(df['Annual_Rate'])) #overwrites prior line and is incorrect
df['Hourly_Rate'] = (df['Average'] / 2080).where([(df['Annually'] == True) & (pd.isnull(df['Hourly_Rate']))])
df['Hourly_Rate'] = df['Average'].where(df['Hourly'] == True & (pd.isna(df['Hourly_Rate'])))
df['Hourly_Rate'] = df['Average'].where(df['Hourly'] == True)
df.head(10)
这些是 be/I 需要工作的行:
df['Hourly_Rate'] = (df['Average'] / 2080).where([(df['Annually'] == True) & (pd.isnull(df['Hourly_Rate']))])
df['Annual_Rate'] = (df['Average'] * 2080).where(df['Hourly'] == True)
期望的结果:
+---+----------+--------+----------+---------+-------+---------+-------------+-------------+
| | Location | Hourly | Annually | Monthly | Daily | Average | Hourly_Rate | Annual_Rate |
+---+----------+--------+----------+---------+-------+---------+-------------+-------------+
| 1 | A | TRUE | FALSE | FALSE | FALSE | 10.1 | 10.1 | 21008 |
| 2 | B | FALSE | TRUE | FALSE | FALSE | 50000 | 24.03846154 | 50000 |
+---+----------+--------+----------+---------+-------+---------+-------------+-------------+
提前致谢。
pd.Series.where
与 numpy.where
的工作方式不同。后者可用于指定矢量化的 if-else 条件,并且可能是您需要的:
df['Annual_Rate'] = np.where(df['Hourly'], df['Average'] * 2080, df['Average'])
df['Hourly_Rate'] = np.where(df['Annually'] & df['Hourly_Rate'].isnull(),
df['Average'] / 2080, df['Average'])
pd.Series.where
用给定的值更新一个系列,其中条件不满足,否则保持不变(在这种情况下NaN
未指定时) ,如 docs:
中所述
Return an object of same shape as self and whose corresponding entries
are from self where cond is True
and otherwise are from other.
另请注意,您可以直接使用布尔级数而不是测试 df[col] == True
。
我正在抓取一些工资数据,我需要根据另一列将其转换为小时费率或年费率。我已经研究了如何做到这一点——这可能不是最有效的——但它适用于一条线。
数据
import pandas as pd, numpy as np
columns = ['Location','Hourly','Annually','Monthly','Daily','Average','Hourly_Rate','Annual_Rate']
df = pd.DataFrame(columns=columns)
df.loc[1] = ['A',True,False,False,False,10.10,np.nan,np.nan]
df.loc[2] = ['B',False,True,False,False,50000,np.nan,np.nan]
df['Annual_Rate'] = (df['Average'] * 2080).where(df['Hourly'] == True) #need this line to run and not get overwritten
df['Annual_Rate'] = df['Average'].where(df['Annually'] == True ) #overwrites prior line
df['Annual_Rate'] = df['Average'].where(df['Annually'] == True & pd.isna(df['Annual_Rate'])) #overwrites prior line and is incorrect
df['Hourly_Rate'] = (df['Average'] / 2080).where([(df['Annually'] == True) & (pd.isnull(df['Hourly_Rate']))])
df['Hourly_Rate'] = df['Average'].where(df['Hourly'] == True & (pd.isna(df['Hourly_Rate'])))
df['Hourly_Rate'] = df['Average'].where(df['Hourly'] == True)
df.head(10)
这些是 be/I 需要工作的行:
df['Hourly_Rate'] = (df['Average'] / 2080).where([(df['Annually'] == True) & (pd.isnull(df['Hourly_Rate']))])
df['Annual_Rate'] = (df['Average'] * 2080).where(df['Hourly'] == True)
期望的结果:
+---+----------+--------+----------+---------+-------+---------+-------------+-------------+
| | Location | Hourly | Annually | Monthly | Daily | Average | Hourly_Rate | Annual_Rate |
+---+----------+--------+----------+---------+-------+---------+-------------+-------------+
| 1 | A | TRUE | FALSE | FALSE | FALSE | 10.1 | 10.1 | 21008 |
| 2 | B | FALSE | TRUE | FALSE | FALSE | 50000 | 24.03846154 | 50000 |
+---+----------+--------+----------+---------+-------+---------+-------------+-------------+
提前致谢。
pd.Series.where
与 numpy.where
的工作方式不同。后者可用于指定矢量化的 if-else 条件,并且可能是您需要的:
df['Annual_Rate'] = np.where(df['Hourly'], df['Average'] * 2080, df['Average'])
df['Hourly_Rate'] = np.where(df['Annually'] & df['Hourly_Rate'].isnull(),
df['Average'] / 2080, df['Average'])
pd.Series.where
用给定的值更新一个系列,其中条件不满足,否则保持不变(在这种情况下NaN
未指定时) ,如 docs:
Return an object of same shape as self and whose corresponding entries are from self where cond is
True
and otherwise are from other.
另请注意,您可以直接使用布尔级数而不是测试 df[col] == True
。