是否可以基于作为列中元素的值列表进行单热?
Is is possible to one-hot based on a list of values being an element in a column?
这是我正在使用的数据框:
| Name | Vehicle Types Owned |
| -------- | -------------- |
| Bob | [Car, Bike] |
| Billy | [Car, Truck, Train] |
| Rob | [Plane, Train, Boat] |
| Sally | [Bike, Boat] |
我正在寻找这样的东西
| Name | Car | Bike | Truck | Train | Plane | Boat |
| ---- | ----| --- | --- | --- | --- | --- |
| Bob | 1 | 1 | 0 | 0 | 0 | 0 |
| Billy| 1 | 0 | 1 | 1 | 0 | 0 |
| Rob | 0 | 0 | 0 | 1 | 1 | 1 |
| Sally| 0 | 1 | 0 | 0 | 0 | 1 |
原始数据框看起来像这样,以防使用它更有用:
| Name | Vehicle Type Owned | Num Wheels | Top Speed |
| -------- | -------------- | --------- | --------- |
| Bob | Car | 4 | 200 mph |
| Bob | Bike | 2 | 20 mph |
| Billy | Car | 4 | 220 mph |
| Billy | Truck | 8 | 100 mph |
| Billy | Train | 80 | 86 mph |
| Rob | Plane | 3 | 600 mph |
| Rob | Train | 80 | 98 mph |
| Rob | Boat | 3 | 128 mph |
| Sally | Bike | 2 | 34 mph |
| Sally | Boat | 3 | 78 mph |
我正在使用 pandas。
尝试 explode
然后 crosstab
df = df.explode('Vehicle Types Owned')
df_ = pd.crosstab([df['Name']], df['Vehicle Types Owned']).reset_index().rename_axis(None, axis=1)
print(df_)
Name Bike Boat Car Plane Train Truck
0 Billy 0 0 1 0 1 1
1 Bob 1 0 1 0 0 0
2 Rob 0 1 0 1 1 0
3 Sally 1 1 0 0 0 0
从原始数据框,您可以使用 pivot_table
:
df.assign(count=1).pivot_table(index='Name', columns='Vehicle Type Owned', values='count', fill_value=0)
结果:
Vehicle Type Owned Bike Boat Car Plane Train Truck
Name
Billy 0 0 1 0 1 1
Bob 1 0 1 0 0 0
Rob 0 1 0 1 1 0
Sally 1 1 0 0 0 0
这是我正在使用的数据框:
| Name | Vehicle Types Owned |
| -------- | -------------- |
| Bob | [Car, Bike] |
| Billy | [Car, Truck, Train] |
| Rob | [Plane, Train, Boat] |
| Sally | [Bike, Boat] |
我正在寻找这样的东西
| Name | Car | Bike | Truck | Train | Plane | Boat |
| ---- | ----| --- | --- | --- | --- | --- |
| Bob | 1 | 1 | 0 | 0 | 0 | 0 |
| Billy| 1 | 0 | 1 | 1 | 0 | 0 |
| Rob | 0 | 0 | 0 | 1 | 1 | 1 |
| Sally| 0 | 1 | 0 | 0 | 0 | 1 |
原始数据框看起来像这样,以防使用它更有用:
| Name | Vehicle Type Owned | Num Wheels | Top Speed |
| -------- | -------------- | --------- | --------- |
| Bob | Car | 4 | 200 mph |
| Bob | Bike | 2 | 20 mph |
| Billy | Car | 4 | 220 mph |
| Billy | Truck | 8 | 100 mph |
| Billy | Train | 80 | 86 mph |
| Rob | Plane | 3 | 600 mph |
| Rob | Train | 80 | 98 mph |
| Rob | Boat | 3 | 128 mph |
| Sally | Bike | 2 | 34 mph |
| Sally | Boat | 3 | 78 mph |
我正在使用 pandas。
尝试 explode
然后 crosstab
df = df.explode('Vehicle Types Owned')
df_ = pd.crosstab([df['Name']], df['Vehicle Types Owned']).reset_index().rename_axis(None, axis=1)
print(df_)
Name Bike Boat Car Plane Train Truck
0 Billy 0 0 1 0 1 1
1 Bob 1 0 1 0 0 0
2 Rob 0 1 0 1 1 0
3 Sally 1 1 0 0 0 0
从原始数据框,您可以使用 pivot_table
:
df.assign(count=1).pivot_table(index='Name', columns='Vehicle Type Owned', values='count', fill_value=0)
结果:
Vehicle Type Owned Bike Boat Car Plane Train Truck
Name
Billy 0 0 1 0 1 1
Bob 1 0 1 0 0 0
Rob 0 1 0 1 1 0
Sally 1 1 0 0 0 0