是否可以基于作为列中元素的值列表进行单热？

Question

这是我正在使用的数据框：

| Name     | Vehicle Types Owned  |
| -------- | --------------       |
| Bob      | [Car, Bike]          |
| Billy    | [Car, Truck, Train]  |
| Rob      | [Plane, Train, Boat] |
| Sally    | [Bike, Boat]         |

我正在寻找这样的东西

| Name | Car | Bike | Truck | Train | Plane | Boat |
| ---- | ----| ---  | ---   | ---   | ---   | ---  |
| Bob  | 1   | 1    | 0     | 0     | 0     | 0    |                 
| Billy| 1   | 0    | 1     | 1     | 0     | 0    |
| Rob  | 0   | 0    | 0     | 1     | 1     | 1    |
| Sally| 0   | 1    | 0     | 0     | 0     | 1    |

原始数据框看起来像这样，以防使用它更有用：

| Name     | Vehicle Type Owned | Num Wheels | Top Speed | 
| -------- | --------------     | ---------  | --------- |
| Bob      | Car                | 4          | 200 mph   |
| Bob      | Bike               | 2          | 20 mph    |
| Billy    | Car                | 4          | 220 mph   |
| Billy    | Truck              | 8          | 100 mph   |
| Billy    | Train              | 80         | 86  mph   |
| Rob      | Plane              | 3          | 600 mph   |
| Rob      | Train              | 80         | 98  mph   |
| Rob      | Boat               | 3          | 128 mph   |
| Sally    | Bike               | 2          | 34  mph   |
| Sally    | Boat               | 3          | 78 mph    |

我正在使用 pandas。

Answer 1

尝试 explode 然后 crosstab

df = df.explode('Vehicle Types Owned')
df_ = pd.crosstab([df['Name']], df['Vehicle Types Owned']).reset_index().rename_axis(None, axis=1)

print(df_)

    Name  Bike  Boat  Car  Plane  Train  Truck
0  Billy     0     0    1      0      1      1
1    Bob     1     0    1      0      0      0
2    Rob     0     1    0      1      1      0
3  Sally     1     1    0      0      0      0

Answer 2

从原始数据框，您可以使用 pivot_table:

df.assign(count=1).pivot_table(index='Name', columns='Vehicle Type Owned', values='count', fill_value=0)

结果：

Vehicle Type Owned  Bike  Boat  Car  Plane  Train  Truck
Name                                                    
Billy                  0     0    1      0      1      1
Bob                    1     0    1      0      0      0
Rob                    0     1    0      1      1      0
Sally                  1     1    0      0      0      0

是否可以基于作为列中元素的值列表进行单热？

Is is possible to one-hot based on a list of values being an element in a column?

python

dataframe

pandas

one-hot-encoding