要列出的字符串向量 python
string vector to list python
我在 Python 工作,我在数据框中有一列是字符串,看起来像这样:
df['set']
0 [911,3040]
1 [130055, 99832, 62131]
2 [19397, 3987, 5330, 14781]
3 [76514, 70178, 70301, 76545]
4 [79185, 38367, 131155, 79433]
我希望它是:
['911','3040'],['130055','99832','62131'],['19397','3987','5330','14781'],['76514',70178','70301','76545'],['79185','38367','131155','79433']
为了能够运行 Word2Vec:
model = gensim.models.Word2Vec(df['set'] , size=100)
谢谢!
我认为你需要:
model = gensim.models.Word2Vec([[str(y) for y in x] for x in df['set']] , size=100)
L = [[str(y) for y in x] for x in df['set']]
print (L)
[['911', '3040'],
['130055', '99832', '62131'],
['19397', '3987', '5330', '14781'],
['76514', '70178', '70301', '76545'],
['79185', '38367', '131155', '79433']]
如果您有一列字符串,我建议您查看 不同的解析方式。
这是我的做法,使用 ast.literal_eval
。
>>> import ast
>>> [list(map(str, x)) for x in df['set'].apply(ast.literal_eval)]
或者,使用 pd.eval
-
>>> [list(map(str, x)) for x in df['set'].apply(pd.eval)] # 100 rows or less
或者,使用 yaml.load
-
>>> import yaml
>>> [list(map(str, x)) for x in df['set'].apply(yaml.load)]
[
['911', '3040'],
['130055', '99832', '62131'],
['19397', '3987', '5330', '14781'],
['76514', '70178', '70301', '76545'],
['79185', '38367', '131155', '79433']
]
要创建新列 (str_set
),并将 set
列中的项目转换为字符串:
df["str_set"] = [[str(item) for item in df.loc[row, "set"]] for row in range(len(df["set"]))]
通过简单的列表理解将每个元素转换为字符串并覆盖旧列:
df['set'] = [[str(i) for i in row] for row in df['set']]
根据提供的数据执行:
data_col = [911,3040], [130055, 99832, 62131], [19397, 3987, 5330, 14781], [76514, 70178, 70301, 76545],[79185, 38367, 131155, 79433]
out = [[str(i) for i in row] for row in data_col]
out
[['911', '3040'],
['130055', '99832', '62131'],
['19397', '3987', '5330', '14781'],
['76514', '70178', '70301', '76545'],
['79185', '38367', '131155', '79433']]
不确定这是否是大数据集的最快方法,因为有很多迭代。
我在 Python 工作,我在数据框中有一列是字符串,看起来像这样:
df['set']
0 [911,3040]
1 [130055, 99832, 62131]
2 [19397, 3987, 5330, 14781]
3 [76514, 70178, 70301, 76545]
4 [79185, 38367, 131155, 79433]
我希望它是:
['911','3040'],['130055','99832','62131'],['19397','3987','5330','14781'],['76514',70178','70301','76545'],['79185','38367','131155','79433']
为了能够运行 Word2Vec:
model = gensim.models.Word2Vec(df['set'] , size=100)
谢谢!
我认为你需要:
model = gensim.models.Word2Vec([[str(y) for y in x] for x in df['set']] , size=100)
L = [[str(y) for y in x] for x in df['set']]
print (L)
[['911', '3040'],
['130055', '99832', '62131'],
['19397', '3987', '5330', '14781'],
['76514', '70178', '70301', '76545'],
['79185', '38367', '131155', '79433']]
如果您有一列字符串,我建议您查看
这是我的做法,使用 ast.literal_eval
。
>>> import ast
>>> [list(map(str, x)) for x in df['set'].apply(ast.literal_eval)]
或者,使用 pd.eval
-
>>> [list(map(str, x)) for x in df['set'].apply(pd.eval)] # 100 rows or less
或者,使用 yaml.load
-
>>> import yaml
>>> [list(map(str, x)) for x in df['set'].apply(yaml.load)]
[
['911', '3040'],
['130055', '99832', '62131'],
['19397', '3987', '5330', '14781'],
['76514', '70178', '70301', '76545'],
['79185', '38367', '131155', '79433']
]
要创建新列 (str_set
),并将 set
列中的项目转换为字符串:
df["str_set"] = [[str(item) for item in df.loc[row, "set"]] for row in range(len(df["set"]))]
通过简单的列表理解将每个元素转换为字符串并覆盖旧列:
df['set'] = [[str(i) for i in row] for row in df['set']]
根据提供的数据执行:
data_col = [911,3040], [130055, 99832, 62131], [19397, 3987, 5330, 14781], [76514, 70178, 70301, 76545],[79185, 38367, 131155, 79433]
out = [[str(i) for i in row] for row in data_col]
out
[['911', '3040'],
['130055', '99832', '62131'],
['19397', '3987', '5330', '14781'],
['76514', '70178', '70301', '76545'],
['79185', '38367', '131155', '79433']]
不确定这是否是大数据集的最快方法,因为有很多迭代。