分类变量到二进制变量

Question

我有一个如下所示的 DataFrame ： initial dataframe

我在 'Concepts_clean' 列中有不同的标签，我想像这样自动填充其他标签：resulting dataframe

例如：第四行，列 'Concepts_clean" 我有 ['Accueil Amabilité', 'Tarifs'] 然后我想填充列 'Accueil Amabilité' 和 'Tarifs' 为 1，所有其他为 0。

最有效的方法是什么？

谢谢

Answer 1

这更像是一个 n-hot 编码问题 -

>>> def change_df(x):
...  for i in x['Concepts_clean'].replace('[','').replace(']','').split(','):
...   x[i.strip()] = 1
...  return x
...
>>> df.apply(change_df, axis=1)

示例输出

Concepts_clean          Ecoute  Informations  Tarifs
[Tarifs]                 0.0           0.0     1.0
[]                       0.0           0.0     0.0
[Ecoute]                 1.0           0.0     0.0
[Tarifs, Informations]   0.0           1.0     1.0

分类变量到二进制变量

categorical variables to binary variables

data-processing

pandas

categorical-data