扩展包含在列中的列表,以便列表的每个元素对应于它自己的列并表示为二进制变量
Expanding a list contained in a column, so that each element of the list corresponds to its own column and is represented as a binary variable
我有一个如下所示的数据框:
skill_list name profile 561 904 468 875 737 402 882...
[561, 564, 632, 859] Aaron Weidele wordpress developer 0 0 0 0 0 0 0
[737, 399, 882, 1086, 5...]Abdelrady Tantawy full stack developer 0 0 0 0 0 0 0
[904, 468, 783, 1120, 8...]Abhijeet A Mulgund machine learning dev... 0 0 0 0 0 0 0 [468] Abhijeet Tiwari salesforce programmi... 0 0 0 0 0 0 0
[518, 466, 875, 445, 402..]Abhimanyu Veer A...machine learning devel...0 0 0 0 0 0 0
skill_list栏包含编码技能列表,对应开发者。我想扩展 skill_list 列中包含的每个列表,以便每个编码技能在其自己的列中表示为二进制变量(1 表示打开,0 表示关闭)。预期输出为:
skill_list name profile 561 904 468 875 737 402 882...
[561, 564, 632, 859] Aaron Weidele wordpress developer 1 0 0 0 0 0 0
[737, 399, 882, 1086, 5...]Abdelrady Tantawy full stack developer 0 0 0 0 1 0 1
[904, 468, 783, 1120, 8...]Abhijeet A Mulgund machine learning dev... 0 1 1 0 0 0 0 [468] Abhijeet Tiwari salesforce programmi... 0 0 1 0 0 0 0
[518, 466, 875, 445, 402..]Abhimanyu Veer A...machine learning devel...0 0 0 0 0 1 0
我试过:
for index, row in df_vector_matrix["skill_list"].items():
for item in row:
for col in df_vector_matrix.columns:
if item == col:
df_vector_matrix.loc[item, col] = "1"
else:
0
非常感谢您的帮助!
您可以尝试 MultiLabelBinarizer
来自 sklearn。
下面的示例可能会有所帮助。
from sklearn.preprocessing import MultiLabelBinarizer
lb = MultiLabelBinarizer()
lb_res = lb.fit_transform(df_vector_matrix['skill_list'])
# convert result into dataframe
res = pd.DataFrame(lb_res,columns=lb.classes_)
# concatenate data result and original dataframe
df_vector_matrix = pd.concat([df_vector_matrix,res],axis=1)
下面是示例数据框,其中 col
列具有列表值。
>>> import pandas as pd
>>> from sklearn.preprocessing import MultiLabelBinarizer
>>> d ={'col':[[1,2,3],[2,3,4,5],[2]],'name':['abc','vdf','rt']}
>>> df = pd.DataFrame(d)
>>> df
col name
0 [1, 2, 3] abc
1 [2, 3, 4, 5] vdf
2 [2] rt
>>> lb = MultiLabelBinarizer()
>>> lb_res = lb.fit_transform(df['col'])
>>> res = pd.DataFrame(lb_res,columns=lb.classes_)
>>> pd.concat([df,res],axis=1)
col name 1 2 3 4 5
0 [1, 2, 3] abc 1 1 1 0 0
1 [2, 3, 4, 5] vdf 0 1 1 1 1
2 [2] rt 0 1 0 0 0
>>>
我有一个如下所示的数据框:
skill_list name profile 561 904 468 875 737 402 882...
[561, 564, 632, 859] Aaron Weidele wordpress developer 0 0 0 0 0 0 0
[737, 399, 882, 1086, 5...]Abdelrady Tantawy full stack developer 0 0 0 0 0 0 0
[904, 468, 783, 1120, 8...]Abhijeet A Mulgund machine learning dev... 0 0 0 0 0 0 0 [468] Abhijeet Tiwari salesforce programmi... 0 0 0 0 0 0 0
[518, 466, 875, 445, 402..]Abhimanyu Veer A...machine learning devel...0 0 0 0 0 0 0
skill_list栏包含编码技能列表,对应开发者。我想扩展 skill_list 列中包含的每个列表,以便每个编码技能在其自己的列中表示为二进制变量(1 表示打开,0 表示关闭)。预期输出为:
skill_list name profile 561 904 468 875 737 402 882...
[561, 564, 632, 859] Aaron Weidele wordpress developer 1 0 0 0 0 0 0
[737, 399, 882, 1086, 5...]Abdelrady Tantawy full stack developer 0 0 0 0 1 0 1
[904, 468, 783, 1120, 8...]Abhijeet A Mulgund machine learning dev... 0 1 1 0 0 0 0 [468] Abhijeet Tiwari salesforce programmi... 0 0 1 0 0 0 0
[518, 466, 875, 445, 402..]Abhimanyu Veer A...machine learning devel...0 0 0 0 0 1 0
我试过:
for index, row in df_vector_matrix["skill_list"].items():
for item in row:
for col in df_vector_matrix.columns:
if item == col:
df_vector_matrix.loc[item, col] = "1"
else:
0
非常感谢您的帮助!
您可以尝试 MultiLabelBinarizer
来自 sklearn。
下面的示例可能会有所帮助。
from sklearn.preprocessing import MultiLabelBinarizer
lb = MultiLabelBinarizer()
lb_res = lb.fit_transform(df_vector_matrix['skill_list'])
# convert result into dataframe
res = pd.DataFrame(lb_res,columns=lb.classes_)
# concatenate data result and original dataframe
df_vector_matrix = pd.concat([df_vector_matrix,res],axis=1)
下面是示例数据框,其中 col
列具有列表值。
>>> import pandas as pd
>>> from sklearn.preprocessing import MultiLabelBinarizer
>>> d ={'col':[[1,2,3],[2,3,4,5],[2]],'name':['abc','vdf','rt']}
>>> df = pd.DataFrame(d)
>>> df
col name
0 [1, 2, 3] abc
1 [2, 3, 4, 5] vdf
2 [2] rt
>>> lb = MultiLabelBinarizer()
>>> lb_res = lb.fit_transform(df['col'])
>>> res = pd.DataFrame(lb_res,columns=lb.classes_)
>>> pd.concat([df,res],axis=1)
col name 1 2 3 4 5
0 [1, 2, 3] abc 1 1 1 0 0
1 [2, 3, 4, 5] vdf 0 1 1 1 1
2 [2] rt 0 1 0 0 0
>>>