Pandas：将字符串的单个列（字段）替换为每个字符串的一列

Question

假设我有以下数据框：

    Colors                  
0   red, white, blue
1   white, blue
2   blue, red
3   white
4   blue

其中 "Colors" 列中的每个唯一值都需要成为一个单独的列，以便可以使用布尔索引填充这些列。示例：

    red white blue white,blue blue,red red,white,blue                    
0   0   0     0    0          0        1    
1   0   0     0    1          0        0
2   0   0     0    0          1        0
3   0   1     0    0          0        0
4   0   0     1    0          0        0

正在寻求有关如何处理此问题的建议

Answer 1

使用：

df = pd.get_dummies(df['Colors'])
print (df)
   blue  blue, red  red, white, blue  white  white, blue
0     0          0                 1      0            0
1     0          0                 0      0            1
2     0          1                 0      0            0
3     0          0                 0      1            0
4     1          0                 0      0            0

或者：

df = df['Colors'].str.get_dummies(', ')
print (df)
   blue  red  white
0     1    1      1
1     1    0      1
2     1    1      0
3     0    0      1
4     1    0      0

Pandas：将字符串的单个列（字段）替换为每个字符串的一列

Pandas: replace a single column (field) of strings with one column for each string

python

boolean-operations

pandas