将一列中的列表拆分为多列

Question

我有两个数据框如下：

df1.shape = (4,2)

Text	Topic
Where is the party tonight?	Party
Let's dance	Party
Hello world	Other
It is rainy today	Weather

df2.shape(4,2)

0	1
Where is the party tonight?	[-0.011570500209927559, -0.010117080062627792,….,0.062448356]
Let's dance	[-0.08268199861049652, -0.0016140303341671824,….,0.02094201]
Hello world	[-0.0637684240937233, -0.01590338535606861,….,0.02094201]
It is rainy today	[0.06379614025354385, -0.02878064103424549,….,0.056790903]

基本上 df2 是每个句子在 df1 上的嵌入，df1 有一个与之相关的主题。嵌入在 df2 中的 'column 1' 中，它有一串大小为 512 的正整数或负整数列表。

我想要的输出 DataFrame 是：

df_output.shape = (4,514)

Text	Topic	Feature_0	Feature_2	….	Feature_511
Where is the party tonight?	Party	-0.0115705	-0.01011708	….	0.0624484
Let's dance	Party	-0.082681999	-0.00161403	….	0.020942
Hello world	Other	-0.063768424	-0.01590338535606861,	….	0.020942
It is rainy today	Weather	0.06379614	-0.028780641	….	0.056790903

我怎样才能完成这个。我试图将 DataFrame df2 中的嵌入拆分为列，但它对我不起作用。这是我到目前为止所做的：

df2.join(pd.DataFrame(df2["1"].values.tolist()).add_prefix('feature_'))

它刚刚创建了一个重复的列 1 作为 feature_0。我什至还没有达到可以加入 df1 和 df2.

的阶段

Answer 1

这个怎么样？

import pandas as pd

data = {0: ['Where is the party tonight?', "Let's dance", 'Hello world', 'It is rainy today'],
1:[[-0.011570500209927559, -0.010117080062627792,0.062448356],[-0.08268199861049652, -0.0016140303341671824,0.02094201],[-0.0637684240937233, -0.01590338535606861,0.02094201],[0.06379614025354385, -0.02878064103424549,0.056790903]]}
df = pd.DataFrame(data)

# mind NO values here
exploded_df = pd.DataFrame(df[1].to_list(), index=df.index)

print(exploded_df.head())

输出：

          0         1         2
0 -0.011571 -0.010117  0.062448
1 -0.082682 -0.001614  0.020942
2 -0.063768 -0.015903  0.020942
3  0.063796 -0.028781  0.056791

然后你可以加入两个dfs。

Answer 2

您可以将 ast.literal_eval 映射到 df2["1"] 中的项目；构建一个 DataFrame 并将其 join 到 df1:

import ast
out = df1.join(pd.DataFrame(map(ast.literal_eval, df2["1"].tolist())).add_prefix('feature_'))

输出：

                          Text    Topic  feature_0  feature_1  feature_2
0  Where is the party tonight?    Party  -0.011571  -0.010117   0.062448
1                  Let's dance    Party  -0.082682  -0.001614   0.020942
2                  Hello world    Other  -0.063768  -0.015903   0.020942
3            It is rainy today  Weather   0.063796  -0.028781   0.056791

将一列中的列表拆分为多列

Split list in a column to multiple columns

python

list

data-manipulation

dataframe

pandas