如何防止 python 中的 KeyErrors？

Question

我在机器学习项目中处理 MinMaxScaler 时遇到 KeyError 问题。这是我的相关代码：

df = pd.read_csv(io.BytesIO(uploaded['Root_Work_Sample.csv']))
print(df.shape)
print(df.columns)
display(df.head(5))
print(df.dtypes)
train_cols = ["feature1, feature2, feature3, feature4, feature5, feature6, feature7, feature8, feature9, feature10, feature11, feature12, feature13, feature14, y"]
df_train, df_test = train_test_split(df, train_size=1000, test_size=876, shuffle=False)
print("Train--Test size", len(df_train), len(df_test))
print(df_train)
print(df_test)
 
# scale the feature MinMax, build array
x = df_train.loc[:,train_cols].values  #THE ERROR IS ON THIS LINE
min_max_scaler = MinMaxScaler()
x_train = min_max_scaler.fit_transform(x)
x_test = min_max_scaler.transform(df_test.loc[:,train_cols])

这是我得到的错误：

KeyError: "None of [Index(['feature1, feature2, feature3, feature4, feature5, feature6, feature7, feature8, feature9, feature10, feature11, feature12, feature13, feature14, y'], dtype='object')] are in the [columns]"

有没有关于如何解决这个问题的建议，以及像我这样的新手如何避免此类错误的一般做法？

Answer 1

df_train 不是数据帧，它是一个二维 numpy 数组，因此您不能在其上使用 loc 方法。我猜你用错了 train_test_split 函数。而且您还错误地指定了 train_cols，您应该将每个功能都用引号引起来，如下所示：

train_cols = ["feature", "feature2",....]

试试这个：

X, y = df[train_cols], df["y"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=876, shuffle=False)

scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

如何防止 python 中的 KeyErrors？

How do I prevent KeyErrors in python?

python

dataframe

pandas

scikit-learn

keyerror