根据数据框中存在的列数合并和创建多个列 - Pandas

Merge and create multiple columns based in the number of columns present in the dataframe- Pandas

我有一个用逗号分隔数字的列,现在这些值应该被分成新的列。

 Site       UserId
   ABC           '456,567,67,96'
   DEF           '67,987'
 

新的 Dataframe 应该是这样的:

Site     UserID              UserId1  UserId2  UserId3  UserId4
ABC     '456,567,67,96'      456       567      67        96
DEF     '67,987'             67        987
POC     '4321,96,912         4321      87       912  

每列旁边还有一个空列,用于将数字与名称和 Phone 号码映射。 用户

 UserId UserName         PhoneNo 
  4321   EB_Meter         9980688666
    987    EB_Meter987    9255488721 
    912    DG_Meter912    8897634219
    567    Ups_Meter567   7263193155 
    456    Ups_Meter456   8987222112 
    96     DG_Meter96     
    67     DGB_Meter

所以最终的 DataFrame 是:

  Values              Value1  Name1            Phone1         Value2   Name2        Phone2       Value3  Name3        Phone3    Value4 Name4  Phone 4
 '456,567,67,96'      456     Ups_Meter456    8987222112      567      Ups_Meter567  7263193155     67      DGB_Meter               96   DG_Meter96
 '67,987'             67      DGB_Meter                        987      EB_Meter987   9255488721
 '4321,96,912         4321    EB_Meter          9980688666    96       DG_Meter96                  912    DG_Meter912  8897634219

此处为每个 UserId 添加了多个列,因此使用 map melt with left join in merge, reshaping is created by DataFrame.pivot:

df2['UserId'] = df2['UserId'].astype(str)
df3 = df1['UserId'].str.strip("'").str.split(',',expand=True)

df3 = (df3.reset_index()
          .melt('index', value_name='UserId')
          .merge(df2, on='UserId', how='left')
          .pivot(index='index', columns='variable')
          .sort_index(axis=1, level=1, sort_remaining=False)
          )
df3.columns = df3.columns.map(lambda x: f'{x[0]}_{x[1] + 1}')

df = df1.join(df3)
print (df)
  Site         UserId UserId_1    UserName_1   PhoneNo_1 UserId_2  \
0  ABC  456,567,67,96      456  Ups_Meter456  8987222112      567   
1  DEF         67,987       67     DGB_Meter         NaN      987   

     UserName_2   PhoneNo_2 UserId_3 UserName_3 PhoneNo_3 UserId_4  \
0  Ups_Meter567  7263193155       67  DGB_Meter       NaN       96   
1   EB_Meter987  9255488721     None        NaN       NaN     None   

   UserName_4 PhoneNo_4  
0  DG_Meter96       NaN  
1         NaN       NaN  

    

您可以使用:

df[[ 'UserId1', 'UserId2', 'UserId3', 'UserId4']] = df['UserId'].str.split(",", expand=True)