将一列中的列表拆分为多列
Split list in a column to multiple columns
我有两个数据框如下:
df1.shape = (4,2)
Text
Topic
Where is the party tonight?
Party
Let's dance
Party
Hello world
Other
It is rainy today
Weather
df2.shape(4,2)
0
1
Where is the party tonight?
[-0.011570500209927559, -0.010117080062627792,….,0.062448356]
Let's dance
[-0.08268199861049652, -0.0016140303341671824,….,0.02094201]
Hello world
[-0.0637684240937233, -0.01590338535606861,….,0.02094201]
It is rainy today
[0.06379614025354385, -0.02878064103424549,….,0.056790903]
基本上 df2
是每个句子在 df1
上的嵌入,df1
有一个与之相关的主题。嵌入在 df2
中的 'column 1' 中,它有一串大小为 512 的正整数或负整数列表。
我想要的输出 DataFrame 是:
df_output.shape = (4,514)
Text
Topic
Feature_0
Feature_2
….
Feature_511
Where is the party tonight?
Party
-0.0115705
-0.01011708
….
0.0624484
Let's dance
Party
-0.082681999
-0.00161403
….
0.020942
Hello world
Other
-0.063768424
-0.01590338535606861,
….
0.020942
It is rainy today
Weather
0.06379614
-0.028780641
….
0.056790903
我怎样才能完成这个。我试图将 DataFrame df2
中的嵌入拆分为列,但它对我不起作用。这是我到目前为止所做的:
df2.join(pd.DataFrame(df2["1"].values.tolist()).add_prefix('feature_'))
它刚刚创建了一个重复的列 1 作为 feature_0
。我什至还没有达到可以加入 df1
和 df2
.
的阶段
这个怎么样?
import pandas as pd
data = {0: ['Where is the party tonight?', "Let's dance", 'Hello world', 'It is rainy today'],
1:[[-0.011570500209927559, -0.010117080062627792,0.062448356],[-0.08268199861049652, -0.0016140303341671824,0.02094201],[-0.0637684240937233, -0.01590338535606861,0.02094201],[0.06379614025354385, -0.02878064103424549,0.056790903]]}
df = pd.DataFrame(data)
# mind NO values here
exploded_df = pd.DataFrame(df[1].to_list(), index=df.index)
print(exploded_df.head())
输出:
0 1 2
0 -0.011571 -0.010117 0.062448
1 -0.082682 -0.001614 0.020942
2 -0.063768 -0.015903 0.020942
3 0.063796 -0.028781 0.056791
然后你可以加入两个dfs。
您可以将 ast.literal_eval
映射到 df2["1"]
中的项目;构建一个 DataFrame 并将其 join
到 df1
:
import ast
out = df1.join(pd.DataFrame(map(ast.literal_eval, df2["1"].tolist())).add_prefix('feature_'))
输出:
Text Topic feature_0 feature_1 feature_2
0 Where is the party tonight? Party -0.011571 -0.010117 0.062448
1 Let's dance Party -0.082682 -0.001614 0.020942
2 Hello world Other -0.063768 -0.015903 0.020942
3 It is rainy today Weather 0.063796 -0.028781 0.056791
我有两个数据框如下:
df1.shape = (4,2)
Text | Topic |
---|---|
Where is the party tonight? | Party |
Let's dance | Party |
Hello world | Other |
It is rainy today | Weather |
df2.shape(4,2)
0 | 1 |
---|---|
Where is the party tonight? | [-0.011570500209927559, -0.010117080062627792,….,0.062448356] |
Let's dance | [-0.08268199861049652, -0.0016140303341671824,….,0.02094201] |
Hello world | [-0.0637684240937233, -0.01590338535606861,….,0.02094201] |
It is rainy today | [0.06379614025354385, -0.02878064103424549,….,0.056790903] |
基本上 df2
是每个句子在 df1
上的嵌入,df1
有一个与之相关的主题。嵌入在 df2
中的 'column 1' 中,它有一串大小为 512 的正整数或负整数列表。
我想要的输出 DataFrame 是:
df_output.shape = (4,514)
Text | Topic | Feature_0 | Feature_2 | …. | Feature_511 |
---|---|---|---|---|---|
Where is the party tonight? | Party | -0.0115705 | -0.01011708 | …. | 0.0624484 |
Let's dance | Party | -0.082681999 | -0.00161403 | …. | 0.020942 |
Hello world | Other | -0.063768424 | -0.01590338535606861, | …. | 0.020942 |
It is rainy today | Weather | 0.06379614 | -0.028780641 | …. | 0.056790903 |
我怎样才能完成这个。我试图将 DataFrame df2
中的嵌入拆分为列,但它对我不起作用。这是我到目前为止所做的:
df2.join(pd.DataFrame(df2["1"].values.tolist()).add_prefix('feature_'))
它刚刚创建了一个重复的列 1 作为 feature_0
。我什至还没有达到可以加入 df1
和 df2
.
这个怎么样?
import pandas as pd
data = {0: ['Where is the party tonight?', "Let's dance", 'Hello world', 'It is rainy today'],
1:[[-0.011570500209927559, -0.010117080062627792,0.062448356],[-0.08268199861049652, -0.0016140303341671824,0.02094201],[-0.0637684240937233, -0.01590338535606861,0.02094201],[0.06379614025354385, -0.02878064103424549,0.056790903]]}
df = pd.DataFrame(data)
# mind NO values here
exploded_df = pd.DataFrame(df[1].to_list(), index=df.index)
print(exploded_df.head())
输出:
0 1 2
0 -0.011571 -0.010117 0.062448
1 -0.082682 -0.001614 0.020942
2 -0.063768 -0.015903 0.020942
3 0.063796 -0.028781 0.056791
然后你可以加入两个dfs。
您可以将 ast.literal_eval
映射到 df2["1"]
中的项目;构建一个 DataFrame 并将其 join
到 df1
:
import ast
out = df1.join(pd.DataFrame(map(ast.literal_eval, df2["1"].tolist())).add_prefix('feature_'))
输出:
Text Topic feature_0 feature_1 feature_2
0 Where is the party tonight? Party -0.011571 -0.010117 0.062448
1 Let's dance Party -0.082682 -0.001614 0.020942
2 Hello world Other -0.063768 -0.015903 0.020942
3 It is rainy today Weather 0.063796 -0.028781 0.056791