使用列名作为列表迭代 Dataframe 的列，然后循环遍历 Python 中的列表

Question

我正在尝试对 Dataframe 的特定列进行标签编码。我已将这些列名称存储在列表中 (cat_features)。现在我想使用 For 循环遍历此列表的元素（字符串）并使用这些元素访问数据框的列。但它说

TypeError: argument must be a string or number

因为我正在访问已经是字符串的列表元素。所以我不明白为什么会抛出该错误。请帮助我理解为什么它不起作用以及我该怎么做才能让它起作用。

cat_features = [x for x in features if x not in features_to_scale]

from sklearn.preprocessing import LabelEncoder

for feature in cat_features:
    le = LabelEncoder()
    dataframe[feature] = le.fit_transform(dataframe[feature])

Answer 1

该错误意味着您的一个或多个列包含 list/tuple/set 或类似内容。为此，您需要先将 list/tuple 转换为字符串，然后才能应用标签编码器

此外，您可以先根据需要的特征过滤数据框，然后使用应用函数 -

，而不是循环

df = main_df[cat_features]
df = df.astype(str)     #This step changes each column to string as label encoder cant work on lists/tuples/sets

lb = LabelEncoder()
df.apply(lb.fit_transform)

稍后您可以将此数据框与剩余的连续特征组合。

使用列名作为列表迭代 Dataframe 的列，然后循环遍历 Python 中的列表

Iterating in Dataframe's Columns using column names as a List and then looping through the list in Python

python

pandas

scikit-learn

label-encoding