如何检查一列的 str 值，确定另一列是否为 less/greater 而不是新创建的列中的 [x] return 布尔值

Question

我有一个看起来像这样的数据框

product	duration
tire change	01:16:51
oil change	05:06:00
tire change	02:03:04
oil change	06:23:14
oil change	03:40:27

我想创建一个新列，returns 一个基于 2 列的布尔值

product	duration	duration_bool
tire change	01:16:51	True
oil change	01:06:00	True
tire change	04:03:04	False
oil change	02:23:14	False
oil change	03:40:27	False

这是在数据帧上实际使用函数的正确方法吗？我无法理解这是否真的实现了我的目标。

def sla_bool_checker(my_var):

    #check if product is a tire change, if it is, check if duration is under 4 hours and return the Boolean in the new column

    if df['product'] == 'tire change' :
        df['duration_bool'] = df['duration'] < pd.Timedelta(4, unit='h')

    #check if product is a oil change, if it is, check if duration is under 2 hours and return the Boolean

    elif df['product'] == 'oil change' :
        df['duration_bool'] < pd.Timedelta(2, unit='h')

我不知道我遗漏了什么，但这是代码错误。

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Answer 1

根据您的条件创建一个布尔数组并将其分配给新列。

df['duration'] = df['duration'].apply(pd.Timedelta) # make sure duration has a dtype of Timedelta

df['duration_bool'] = ((df['product'] == 'tire change') & (df['duration'] < pd.Timedelta(4, unit='h'))) | \
((df['product'] == 'oil change') & (df['duration'] < pd.Timedelta(2, unit='h')))

       product        duration  duration_bool
0  tire change 0 days 01:16:51           True
1   oil change 0 days 05:06:00          False
2  tire change 0 days 02:03:04           True
3   oil change 0 days 06:23:14          False
4   oil change 0 days 03:40:27          False

这是什么意思

((df['product'] == 'tire change') & (df['duration'] < pd.Timedelta(4, unit='h'))) 其中产品等于轮胎更换且持续时间少于 4 小时。

| 或

((df['product'] == 'oil change') & (df['duration'] < pd.Timedelta(2, unit='h'))) 其中产品等于换油且持续时间少于 2 小时

Answer 2

首先，您的两个示例中的 durations 不匹配，这使得比较输入与输出结果变得困难。请下次检查。然后你可以使用：

df.loc[df["product"] == "tire change", "duration_bool"] = pd.to_timedelta(df["duration"]) < pd.Timedelta(4, unit="h")
df.loc[df["product"] == "oil change", "duration_bool"] = pd.to_timedelta(df["duration"]) < pd.Timedelta(2, unit="h")

这直接将行 duration_bool 的值设置为 pd.Timedelta(...) 函数的结果，但 pd.to_timedelta(...) 确保它是要与之比较的时间增量。这让你：

|    | product     | duration   | duration_bool   |
|---:|:------------|:-----------|:----------------|
|  0 | tire change | 01:16:51   | True            |
|  1 | oil change  | 01:06:00   | True            |
|  2 | tire change | 04:03:04   | False           |
|  3 | oil change  | 02:23:14   | False           |
|  4 | oil change  | 03:40:27   | False           |

Answer 3

我发现我需要在 def sla_bool_checker 中添加一个 return 子句。然后需要使用 apply 将 return 值应用于我的数据框。我仍然无法确切地 了解 apply 是如何工作的，但它确实有效，我希望我能为需要的人提供更深入的解释。

我可能应该使用 np.where() （仍然不清楚如何使它起作用）但@it_is_chris 的回答实际上对我也很有效！（感谢克里斯）

从那以后，我一直在研究，因为我真的很想找出一种使用函数的方法。可能不理想，但我学到了很多东西。

这是我使用的代码。

def sla_bool_checker(my_var):
    #check if product is a tire change, if it is, check if duration is under 4 hours and return the Boolean in new column
    if my_var['product'] == 'tire change' :
        return my_var['duration'] < pd.Timedelta(4, unit='h')
    #check if product is an oil change, if it is, check if duration is under 24 hours and return the Boolean
    elif my_var['product'] == 'oil change' :
        return my_var['duration'] < pd.Timedelta(2, unit='h')

然后我用了

df['duration_bool'] = df.apply(sla_bool_checker, axis=1)     
df

导致

	product	duration	duration_bool
0	tire change	01:16:51	True
1	oil change	01:06:00	True
2	tire change	04:03:04	False
3	oil change	02:23:14	False
4	oil change	03:40:27	False

如何检查一列的 str 值，确定另一列是否为 less/greater 而不是新创建的列中的 [x] return 布尔值

How to check a column for str value, determine if another column is less/greater than [x] return boolean in newly made column

python

if-statement

boolean

filter

dataframe