添加二维张量作为数据框的列

Add 2-D tensor as column of dataframe

我的数据框看起来像

   INCIDENT_NUMBER                      
0  INC000030884498
1  INC000029956111
2  INC000029555353
3  INC000029555338

对于上述四个事件,我也有一个二维张量

  sample_concatenated_embedding=
  tensor(                    
  [[ 0.6993, -0.1427, -0.1532,  ...,  0.8386,  0.5151,  0.8906],
  [ 0.7382, -0.8497,  0.1363,  ...,  0.8054,  0.5432,  0.9082],
  [ 0.0835, -0.2431, -0.0815,  ...,  0.8025,  0.5217,  0.9041],
  [-0.0346, -0.2396, -0.5831,  ...,  0.7591,  0.6138,  0.9649]],
  grad_fn=<ViewBackward>)

嵌入的大小为 [4, 161280]

我想在我的 Dataframe 的连续四行中插入张量

最后的 Dataframe 应该是这样的

   INCIDENT_NUMBER      embedding         
0  INC000030884498      [ 0.6993, -0.1427, -0.1532,  ...,  0.8386,  0.5151,  0.8906]
1  INC000029956111      [ 0.7382, -0.8497,  0.1363,  ...,  0.8054,  0.5432,  0.9082]
2  INC000029555353      [ 0.0835, -0.2431, -0.0815,  ...,  0.8025,  0.5217,  0.9041]
3  INC000029555338      [-0.0346, -0.2396, -0.5831,  ...,  0.7591,  0.6138,  0.9649]

如果张量是级数,我可以简单地使用下面的命令

 my_dataframe['embedding'] = sample_concatenated_embedding

我可以使用 for 循环并像

这样轻松地插入到数据框中
 empty_dataframe = pd.DataFrame(columns=['incident','embedding'])

 for item in range(0,4):
     INCIDENT_NUMBER = my_dataframe['INCIDENT_NUMBER'].iloc[item]
     temp_df = pd.DataFrame([[INCIDENT_NUMBER, sample_concatenated_embedding[item]], columns=['incident','embedding']) 
     frames = [empty_dataframe, temp_df]
     empty_dataframe = pd.concat(frames)

但是 for 循环是低效的。有没有更短的方法来达到最终目标

如果INCIDENT_NUMBER的值索引和sample_concatenated_embedding的值索引匹配。您可以将 sample_concatenated_embedding 转换为列表,然后将其分配给新列,如

import pandas as pd


df = pd.DataFrame({'INCIDENT_NUMBER': ['INC000030884498', 'INC000029956111', 'INC000029555353', 'INC000029555338']})

data = [[ 0.6993, -0.1427, -0.1532, 0.8386,  0.5151,  0.8906],
        [ 0.7382, -0.8497,  0.1363, 0.8054,  0.5432,  0.9082],
        [ 0.0835, -0.2431, -0.0815, 0.8025,  0.5217,  0.9041],
        [-0.0346, -0.2396, -0.5831, 0.7591,  0.6138,  0.9649]]

df['embedding'] = data
df.rename(columns={'INCIDENT_NUMBER': 'incident'}, inplace=True)
print(df)

          incident                                            embedding
0  INC000030884498   [0.6993, -0.1427, -0.1532, 0.8386, 0.5151, 0.8906]
1  INC000029956111    [0.7382, -0.8497, 0.1363, 0.8054, 0.5432, 0.9082]
2  INC000029555353   [0.0835, -0.2431, -0.0815, 0.8025, 0.5217, 0.9041]
3  INC000029555338  [-0.0346, -0.2396, -0.5831, 0.7591, 0.6138, 0.9649]