如何检查一列的 str 值,确定另一列是否为 less/greater 而不是新创建的列中的 [x] return 布尔值
How to check a column for str value, determine if another column is less/greater than [x] return boolean in newly made column
我有一个看起来像这样的数据框
product
duration
tire change
01:16:51
oil change
05:06:00
tire change
02:03:04
oil change
06:23:14
oil change
03:40:27
我想创建一个新列,returns 一个基于 2 列的布尔值
product
duration
duration_bool
tire change
01:16:51
True
oil change
01:06:00
True
tire change
04:03:04
False
oil change
02:23:14
False
oil change
03:40:27
False
这是在数据帧上实际使用函数的正确方法吗?我无法理解这是否真的实现了我的目标。
def sla_bool_checker(my_var):
#check if product is a tire change, if it is, check if duration is under 4 hours and return the Boolean in the new column
if df['product'] == 'tire change' :
df['duration_bool'] = df['duration'] < pd.Timedelta(4, unit='h')
#check if product is a oil change, if it is, check if duration is under 2 hours and return the Boolean
elif df['product'] == 'oil change' :
df['duration_bool'] < pd.Timedelta(2, unit='h')
我不知道我遗漏了什么,但这是代码错误。
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
根据您的条件创建一个布尔数组并将其分配给新列。
df['duration'] = df['duration'].apply(pd.Timedelta) # make sure duration has a dtype of Timedelta
df['duration_bool'] = ((df['product'] == 'tire change') & (df['duration'] < pd.Timedelta(4, unit='h'))) | \
((df['product'] == 'oil change') & (df['duration'] < pd.Timedelta(2, unit='h')))
product duration duration_bool
0 tire change 0 days 01:16:51 True
1 oil change 0 days 05:06:00 False
2 tire change 0 days 02:03:04 True
3 oil change 0 days 06:23:14 False
4 oil change 0 days 03:40:27 False
这是什么意思
((df['product'] == 'tire change') & (df['duration'] < pd.Timedelta(4, unit='h')))
其中产品等于轮胎更换且持续时间少于 4 小时。
|
或
((df['product'] == 'oil change') & (df['duration'] < pd.Timedelta(2, unit='h')))
其中产品等于换油且持续时间少于 2 小时
首先,您的两个示例中的 durations
不匹配,这使得比较输入与输出结果变得困难。请下次检查。然后你可以使用:
df.loc[df["product"] == "tire change", "duration_bool"] = pd.to_timedelta(df["duration"]) < pd.Timedelta(4, unit="h")
df.loc[df["product"] == "oil change", "duration_bool"] = pd.to_timedelta(df["duration"]) < pd.Timedelta(2, unit="h")
这直接将行 duration_bool
的值设置为 pd.Timedelta(...)
函数的结果,但 pd.to_timedelta(...)
确保它是要与之比较的时间增量。
这让你:
| | product | duration | duration_bool |
|---:|:------------|:-----------|:----------------|
| 0 | tire change | 01:16:51 | True |
| 1 | oil change | 01:06:00 | True |
| 2 | tire change | 04:03:04 | False |
| 3 | oil change | 02:23:14 | False |
| 4 | oil change | 03:40:27 | False |
我发现我需要在 def sla_bool_checker
中添加一个 return
子句。然后需要使用 apply
将 return 值应用于我的数据框。我仍然无法确切地 了解 apply
是如何工作的,但它确实有效,我希望我能为需要的人提供更深入的解释。
我可能应该使用 np.where() (仍然不清楚如何使它起作用)但@it_is_chris 的回答实际上对我也很有效! (感谢克里斯)
从那以后,我一直在研究,因为我真的很想找出一种使用函数的方法。可能不理想,但我学到了很多东西。
这是我使用的代码。
def sla_bool_checker(my_var):
#check if product is a tire change, if it is, check if duration is under 4 hours and return the Boolean in new column
if my_var['product'] == 'tire change' :
return my_var['duration'] < pd.Timedelta(4, unit='h')
#check if product is an oil change, if it is, check if duration is under 24 hours and return the Boolean
elif my_var['product'] == 'oil change' :
return my_var['duration'] < pd.Timedelta(2, unit='h')
然后我用了
df['duration_bool'] = df.apply(sla_bool_checker, axis=1)
df
导致
product
duration
duration_bool
0
tire change
01:16:51
True
1
oil change
01:06:00
True
2
tire change
04:03:04
False
3
oil change
02:23:14
False
4
oil change
03:40:27
False
我有一个看起来像这样的数据框
product | duration |
---|---|
tire change | 01:16:51 |
oil change | 05:06:00 |
tire change | 02:03:04 |
oil change | 06:23:14 |
oil change | 03:40:27 |
我想创建一个新列,returns 一个基于 2 列的布尔值
product | duration | duration_bool |
---|---|---|
tire change | 01:16:51 | True |
oil change | 01:06:00 | True |
tire change | 04:03:04 | False |
oil change | 02:23:14 | False |
oil change | 03:40:27 | False |
这是在数据帧上实际使用函数的正确方法吗?我无法理解这是否真的实现了我的目标。
def sla_bool_checker(my_var):
#check if product is a tire change, if it is, check if duration is under 4 hours and return the Boolean in the new column
if df['product'] == 'tire change' :
df['duration_bool'] = df['duration'] < pd.Timedelta(4, unit='h')
#check if product is a oil change, if it is, check if duration is under 2 hours and return the Boolean
elif df['product'] == 'oil change' :
df['duration_bool'] < pd.Timedelta(2, unit='h')
我不知道我遗漏了什么,但这是代码错误。
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
根据您的条件创建一个布尔数组并将其分配给新列。
df['duration'] = df['duration'].apply(pd.Timedelta) # make sure duration has a dtype of Timedelta
df['duration_bool'] = ((df['product'] == 'tire change') & (df['duration'] < pd.Timedelta(4, unit='h'))) | \
((df['product'] == 'oil change') & (df['duration'] < pd.Timedelta(2, unit='h')))
product duration duration_bool
0 tire change 0 days 01:16:51 True
1 oil change 0 days 05:06:00 False
2 tire change 0 days 02:03:04 True
3 oil change 0 days 06:23:14 False
4 oil change 0 days 03:40:27 False
这是什么意思
((df['product'] == 'tire change') & (df['duration'] < pd.Timedelta(4, unit='h')))
其中产品等于轮胎更换且持续时间少于 4 小时。
|
或
((df['product'] == 'oil change') & (df['duration'] < pd.Timedelta(2, unit='h')))
其中产品等于换油且持续时间少于 2 小时
首先,您的两个示例中的 durations
不匹配,这使得比较输入与输出结果变得困难。请下次检查。然后你可以使用:
df.loc[df["product"] == "tire change", "duration_bool"] = pd.to_timedelta(df["duration"]) < pd.Timedelta(4, unit="h")
df.loc[df["product"] == "oil change", "duration_bool"] = pd.to_timedelta(df["duration"]) < pd.Timedelta(2, unit="h")
这直接将行 duration_bool
的值设置为 pd.Timedelta(...)
函数的结果,但 pd.to_timedelta(...)
确保它是要与之比较的时间增量。
这让你:
| | product | duration | duration_bool |
|---:|:------------|:-----------|:----------------|
| 0 | tire change | 01:16:51 | True |
| 1 | oil change | 01:06:00 | True |
| 2 | tire change | 04:03:04 | False |
| 3 | oil change | 02:23:14 | False |
| 4 | oil change | 03:40:27 | False |
我发现我需要在 def sla_bool_checker
中添加一个 return
子句。然后需要使用 apply
将 return 值应用于我的数据框。我仍然无法确切地 了解 apply
是如何工作的,但它确实有效,我希望我能为需要的人提供更深入的解释。
我可能应该使用 np.where() (仍然不清楚如何使它起作用)但@it_is_chris 的回答实际上对我也很有效! (感谢克里斯)
从那以后,我一直在研究,因为我真的很想找出一种使用函数的方法。可能不理想,但我学到了很多东西。
这是我使用的代码。
def sla_bool_checker(my_var):
#check if product is a tire change, if it is, check if duration is under 4 hours and return the Boolean in new column
if my_var['product'] == 'tire change' :
return my_var['duration'] < pd.Timedelta(4, unit='h')
#check if product is an oil change, if it is, check if duration is under 24 hours and return the Boolean
elif my_var['product'] == 'oil change' :
return my_var['duration'] < pd.Timedelta(2, unit='h')
然后我用了
df['duration_bool'] = df.apply(sla_bool_checker, axis=1)
df
导致
product | duration | duration_bool | |
---|---|---|---|
0 | tire change | 01:16:51 | True |
1 | oil change | 01:06:00 | True |
2 | tire change | 04:03:04 | False |
3 | oil change | 02:23:14 | False |
4 | oil change | 03:40:27 | False |