Python 3.X Pandas - ValueError: Must pass DataFrame with boolean values only

Python 3.X Pandas - ValueError: Must pass DataFrame with boolean values only

我有多个 pandas df,我试图在特定列(目标列)之后对每一列进行切片。问题是每次尝试切片时 Target Col 都会有不同的索引号。 Pandas dfs 看起来像这样:

+------------+------+------+
| Target Col | Col2 | Col3 |
+------------+------+------+
| Data       | Data | Data |
+------------+------+------+

+------+------------+------+
| Col1 | Target Col | Col3 |
+------+------------+------+
| Data | Data       | Data |
+------+------------+------+

我想要提取的是每个 df 上目标列之后的每一列:

+------------+------+
| Target Col | Col3 |
+------------+------+
| Data       | Data |
+------------+------+
+------------+------+------+
| Target Col | Col2 | Col3 |
+------------+------+------+
| Data       | Data | Data |
+------------+------+------+

到目前为止我的代码是(为清楚起见缩短):

for files in dir:    
    df = pd.read_excel(files)    
    target_cols = [col for col in df if col.startswith('Target Col')]
    list_data = list(df.columns)
    table_tail = df.iloc[:, list_data.index(target_cols[0]):]

我得到的错误是“ValueError:必须仅使用布尔值传递 DataFrame

由于试图编写代码以根据索引号对多个 Pandas df 进行切片,因此代码以这种方式编写(进出列表,有点复杂)。如果有人有更短、更简单的方法来让它工作,我很高兴听到一些选择。

你可以考虑使用面具和np.cumsum

import pandas as pd
import numpy as np

cols1 = ["Target Col", "Col2", "Col3"]
cols2 = ["Col1", "Target Col", "Col3"]
df = pd.DataFrame(np.random.randn(4,3),columns=cols1)

target_cols = [col=='Target Col' for col in df]
target_cols = np.cumsum(target_cols).astype(bool)

df = df[df.columns[target_cols]]
print(df)

   Target Col      Col2      Col3
0   -0.191493  1.382337  1.030406
1   -0.008358  0.262019 -1.744335
2   -0.218022  0.010588  0.373674
3   -0.585362 -0.664626 -1.030293


df = pd.DataFrame(np.random.randn(4,3),columns=cols2)
target_cols = [col=='Target Col' for col in df]
target_cols = np.cumsum(target_cols).astype(bool)

df = df[df.columns[target_cols]]
print(df)

   Target Col      Col3
0   -1.677061  0.123344
1   -0.616199 -0.277216
2   -0.541302 -0.635904
3    0.821543 -0.826233

这里有一个使用get_loc方法的方法:

cols_to_select = [x for en, x in enumerate(df.columns) if en >= df.columns.get_loc('Target Col')]

df = df[cols_to_select]