One-Hot 使用不在列中的元素对 Pandas 上的列进行编码
One-Hot Encoding a column on a Pandas with elements which are not in the column
我的数据框:
Index letters
0 A
1 B
2 D
3 Z
在Python中,我想获得上面字母列的单热编码数据帧,其中的元素不在下面的列中:
Index A B C D E K Z
0 1 0 0 0 0 0 0
1 0 1 0 0 0 0 0
2 0 0 0 1 0 0 0
3 0 0 0 0 0 0 1
为此使用 get_dummies
:
df = pd.get_dummies(df)
df.columns = df.columns.str.replace('letters_', '')
print(df)
Index A B D Z
0 0 1 0 0 0
1 1 0 1 0 0
2 2 0 0 1 0
3 3 0 0 0 1
import pandas as pd
df = pd.DataFrame(["A", "A", "C", "C", "E", "F", "G"], columns=['letters'])
all_cats = ["A", "B", "C", "D", "E", "F", "G"]
ohe = pd.get_dummies(df['letters'], sparse=True).reindex(all_cats, axis=1, fill_value=0)
>>> ohe
A B C D E F G
0 1 0 0 0 0 0 0
1 1 0 0 0 0 0 0
2 0 0 1 0 0 0 0
3 0 0 1 0 0 0 0
4 0 0 0 0 1 0 0
5 0 0 0 0 0 1 0
6 0 0 0 0 0 0 1
使用merge
:
df = pd.DataFrame({'Letters':['A','B', 'D', 'Z']})
all_letters = ['A','B', 'C', 'D','E','K', 'Z']
s = pd.get_dummies(all_letters)
s['Letters'] = all_letters
df2 = df.merge(s, on='Letters')
df2
给予
| | Letters | A | B | C | D | E | K | Z |
|---:|:----------|----:|----:|----:|----:|----:|----:|----:|
| 0 | A | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | B | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 2 | D | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 3 | Z | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
我的数据框:
Index letters
0 A
1 B
2 D
3 Z
在Python中,我想获得上面字母列的单热编码数据帧,其中的元素不在下面的列中:
Index A B C D E K Z
0 1 0 0 0 0 0 0
1 0 1 0 0 0 0 0
2 0 0 0 1 0 0 0
3 0 0 0 0 0 0 1
为此使用 get_dummies
:
df = pd.get_dummies(df)
df.columns = df.columns.str.replace('letters_', '')
print(df)
Index A B D Z
0 0 1 0 0 0
1 1 0 1 0 0
2 2 0 0 1 0
3 3 0 0 0 1
import pandas as pd
df = pd.DataFrame(["A", "A", "C", "C", "E", "F", "G"], columns=['letters'])
all_cats = ["A", "B", "C", "D", "E", "F", "G"]
ohe = pd.get_dummies(df['letters'], sparse=True).reindex(all_cats, axis=1, fill_value=0)
>>> ohe
A B C D E F G
0 1 0 0 0 0 0 0
1 1 0 0 0 0 0 0
2 0 0 1 0 0 0 0
3 0 0 1 0 0 0 0
4 0 0 0 0 1 0 0
5 0 0 0 0 0 1 0
6 0 0 0 0 0 0 1
使用merge
:
df = pd.DataFrame({'Letters':['A','B', 'D', 'Z']})
all_letters = ['A','B', 'C', 'D','E','K', 'Z']
s = pd.get_dummies(all_letters)
s['Letters'] = all_letters
df2 = df.merge(s, on='Letters')
df2
给予
| | Letters | A | B | C | D | E | K | Z |
|---:|:----------|----:|----:|----:|----:|----:|----:|----:|
| 0 | A | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | B | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 2 | D | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 3 | Z | 0 | 0 | 0 | 0 | 0 | 0 | 1 |