Error: The truth value of a DataFrame is ambiguous when splitting strings into two columns if two conditions are met

Question

如果满足以下两个条件，我将尝试拆分列 ['first'] 中的字符串。

列['first'] 包含单词 'floor' 或 'floors'
列['second'] 为空

但是，我收到一条错误消息。

DataFrame 的真值不明确。使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。

下面是我的代码

#boolean series for condition 1: when values in column['second'] are empty

only_first_token = pd.isna(results_threshold_50_split_ownership['second']) 
print (len(only_first_token)) 
print (type(only_first_token))

#boolean series for condition 2: when values in column['first'] contain string floor or floors

first_token_contain_floor = results_threshold_50_split_ownership['first'].str.contains('floors|floor',case=False)
print (len(first_token_contain_floor))
print (type(only_first_token))

#if both conditions are met, the string in column['first'] will be split into column['first'] and['second']

if results_threshold_50_split_ownership[(only_first_token) & (first_token_contain_floor)]:
    results_threshold_50_split_ownership.first.str.split('Floors|Floor', expand=True)

print(results_threshold_50_split_ownership['first'])

我在这里阅读了一些答案并且已经更改了几次代码。我确保布尔值的总数在 1016 时相同。如果我删除 if，我可以使用相同的代码成功找到可以满足两个条件的行。所以我不明白为什么会模棱两可

如有任何帮助，我们将不胜感激。非常感谢。

Answer 1

您的条件完全没问题，问题是 if 语句 - 它显示为：

if boolean_array :
  ...

但是if只需要一个布尔值，不是整个布尔数组。为了将布尔数组缩减为一个值，您可以使用例如any() 或 all() 如错误消息所示 - if all(boolean_array): 等

你真正想做的大概是：

results_threshold_50_split_ownership[(only_first_token) & (first_token_contain_floor)]['first'].str.split('Floors|Floor', expand=True)

即使用布尔数组进行布尔索引。

根据以下评论更新：
您可以使用 results_threshold_50_split_ownership.loc[(only_first_token) & (first_token_contain_floor), ['first', 'second']] 将拆分结果分配给原始列。但是，在这种情况下，您需要通过在拆分函数中指定 n=1 来确保最多 2 列是 return（以防您的第一列多次包含单词 'floor'） .
示例：

results_threshold_50_split_ownership = pd.DataFrame({'first': ['first floor value', 'all floors values', 'x'],
                                                     'second': ['y', None, None]})
print(results_threshold_50_split_ownership)
#               first second
#0  first floor value      y
#1  all floors values   None
#2                  x   None
only_first_token = pd.isna(results_threshold_50_split_ownership['second'])
first_token_contain_floor = results_threshold_50_split_ownership['first'].str.contains('floors|floor',case=False)
results_threshold_50_split_ownership.loc[(only_first_token) & (first_token_contain_floor), ['first', 'second']] = results_threshold_50_split_ownership[(only_first_token) & (first_token_contain_floor)]['first'].str.split('floors|floor', 1, expand=True).to_numpy()
print(results_threshold_50_split_ownership)
#               first   second
#0  first floor value        y
#1               all    values
#2                  x     None

Error: The truth value of a DataFrame is ambiguous when splitting strings into two columns if two conditions are met

Error: The truth value of a DataFrame is ambiguous when splitting strings into two columns if two conditions are met

python

boolean-logic

if-statement

pandas