在 for 循环的帮助下用数字替换 pandas 行中的字符串值

Question

我的数据框中的一列中有 10 个唯一值。例如下面是数据框

df['categories'].unique()

输出为：

Electronic
Computers
Mobile Phone
Router
Food

我想将 'Electronic' 替换为 1，将 'Computers' 替换为 2，将 'Mobile Phone' 替换为 3，将 'Router' 替换为 4，将 'Food' 替换为 5。预期输出必须是

df['categories'].unique()

预期输出：

我尝试循环 df['categories'].unique()，但我做不到。谁能帮我解决这个问题？

Answer 1

这会起作用：

new_vals = {'Electronic': 1, 'Computers' : 2, 'Mobile Phone' : 3, 'Router' : 4 , 'Food' : 5}
df = df.replace({'categories': new_vals})

Answer 2

scikit-learn 提供 similar functionality.

当您尝试构建预测模型而代码不起作用时，此方法是最佳选择：

例如，您并不关心：“计算机”类别将获得“1”或“2”或“5”的代码。

from sklearn.preprocessing import OrdinalEncoder

enc = OrdinalEncoder()
df['categories'] = enc.fit_transform(X=df[['categories']]).astype('int')

Answer 3

你可以试试这个：

df['categories'] = df['categories'].astype('category').cat.codes

Replace string values in pandas rows with numbers with the help of for loop