创建数据框时出现意外的 IndexError

Question

我正在尝试执行以下代码：

heart_df = pd.read_csv(r"location")
X = heart_df.iloc[:, :-1].values
y = heart_df.iloc[:, 11].values

new_df = X[["Sex", "ChestPainType", "RestingECG", "ExerciseAngina", "ST_Slope"]].values() #this is line 17

cat_cols = new_df.copy()

并得到像这样的 IndexError：

  File "***location***", line 17, in <module>
  new_df = X[["Sex", "ChestPainType", "RestingECG", "ExerciseAngina", "ST_Slope"]].values()
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

据我所知，当我们使用浮点数作为索引时会出现此 IndexError，但不明白为什么会出现这种情况。

在这里，通过创建 new_df 然后 cat_cols，我想分离分类列以在稍后阶段应用 OneHotEncoding。

数据集在这里：https://www.kaggle.com/fedesoriano/heart-failure-prediction。

Answer 1

错误来自：

X = heart_df.iloc[:, :-1].values

.values 部分将数据框转换为 numpy 数组，X 中的某些列与 numpy 数组不兼容。

所以我们可以这样写：

X = heart_df.iloc[:, :-1]
new_df = X[["Sex", "ChestPainType", "RestingECG", "ExerciseAngina", "ST_Slope"]]

创建数据框时出现意外的 IndexError

Getting unxpected IndexError when creating a dataframe

python-3.x

one-hot-encoding

index-error