根据数据框中存在的列数合并和创建多个列 - Pandas
Merge and create multiple columns based in the number of columns present in the dataframe- Pandas
我有一个用逗号分隔数字的列,现在这些值应该被分成新的列。
Site UserId
ABC '456,567,67,96'
DEF '67,987'
新的 Dataframe 应该是这样的:
Site UserID UserId1 UserId2 UserId3 UserId4
ABC '456,567,67,96' 456 567 67 96
DEF '67,987' 67 987
POC '4321,96,912 4321 87 912
每列旁边还有一个空列,用于将数字与名称和 Phone 号码映射。
用户
UserId UserName PhoneNo
4321 EB_Meter 9980688666
987 EB_Meter987 9255488721
912 DG_Meter912 8897634219
567 Ups_Meter567 7263193155
456 Ups_Meter456 8987222112
96 DG_Meter96
67 DGB_Meter
所以最终的 DataFrame 是:
Values Value1 Name1 Phone1 Value2 Name2 Phone2 Value3 Name3 Phone3 Value4 Name4 Phone 4
'456,567,67,96' 456 Ups_Meter456 8987222112 567 Ups_Meter567 7263193155 67 DGB_Meter 96 DG_Meter96
'67,987' 67 DGB_Meter 987 EB_Meter987 9255488721
'4321,96,912 4321 EB_Meter 9980688666 96 DG_Meter96 912 DG_Meter912 8897634219
此处为每个 UserId
添加了多个列,因此使用 map
melt
with left join in merge
, reshaping is created by DataFrame.pivot
:
df2['UserId'] = df2['UserId'].astype(str)
df3 = df1['UserId'].str.strip("'").str.split(',',expand=True)
df3 = (df3.reset_index()
.melt('index', value_name='UserId')
.merge(df2, on='UserId', how='left')
.pivot(index='index', columns='variable')
.sort_index(axis=1, level=1, sort_remaining=False)
)
df3.columns = df3.columns.map(lambda x: f'{x[0]}_{x[1] + 1}')
df = df1.join(df3)
print (df)
Site UserId UserId_1 UserName_1 PhoneNo_1 UserId_2 \
0 ABC 456,567,67,96 456 Ups_Meter456 8987222112 567
1 DEF 67,987 67 DGB_Meter NaN 987
UserName_2 PhoneNo_2 UserId_3 UserName_3 PhoneNo_3 UserId_4 \
0 Ups_Meter567 7263193155 67 DGB_Meter NaN 96
1 EB_Meter987 9255488721 None NaN NaN None
UserName_4 PhoneNo_4
0 DG_Meter96 NaN
1 NaN NaN
您可以使用:
df[[ 'UserId1', 'UserId2', 'UserId3', 'UserId4']] = df['UserId'].str.split(",", expand=True)
我有一个用逗号分隔数字的列,现在这些值应该被分成新的列。
Site UserId
ABC '456,567,67,96'
DEF '67,987'
新的 Dataframe 应该是这样的:
Site UserID UserId1 UserId2 UserId3 UserId4
ABC '456,567,67,96' 456 567 67 96
DEF '67,987' 67 987
POC '4321,96,912 4321 87 912
每列旁边还有一个空列,用于将数字与名称和 Phone 号码映射。 用户
UserId UserName PhoneNo
4321 EB_Meter 9980688666
987 EB_Meter987 9255488721
912 DG_Meter912 8897634219
567 Ups_Meter567 7263193155
456 Ups_Meter456 8987222112
96 DG_Meter96
67 DGB_Meter
所以最终的 DataFrame 是:
Values Value1 Name1 Phone1 Value2 Name2 Phone2 Value3 Name3 Phone3 Value4 Name4 Phone 4
'456,567,67,96' 456 Ups_Meter456 8987222112 567 Ups_Meter567 7263193155 67 DGB_Meter 96 DG_Meter96
'67,987' 67 DGB_Meter 987 EB_Meter987 9255488721
'4321,96,912 4321 EB_Meter 9980688666 96 DG_Meter96 912 DG_Meter912 8897634219
此处为每个 UserId
添加了多个列,因此使用 map
melt
with left join in merge
, reshaping is created by DataFrame.pivot
:
df2['UserId'] = df2['UserId'].astype(str)
df3 = df1['UserId'].str.strip("'").str.split(',',expand=True)
df3 = (df3.reset_index()
.melt('index', value_name='UserId')
.merge(df2, on='UserId', how='left')
.pivot(index='index', columns='variable')
.sort_index(axis=1, level=1, sort_remaining=False)
)
df3.columns = df3.columns.map(lambda x: f'{x[0]}_{x[1] + 1}')
df = df1.join(df3)
print (df)
Site UserId UserId_1 UserName_1 PhoneNo_1 UserId_2 \
0 ABC 456,567,67,96 456 Ups_Meter456 8987222112 567
1 DEF 67,987 67 DGB_Meter NaN 987
UserName_2 PhoneNo_2 UserId_3 UserName_3 PhoneNo_3 UserId_4 \
0 Ups_Meter567 7263193155 67 DGB_Meter NaN 96
1 EB_Meter987 9255488721 None NaN NaN None
UserName_4 PhoneNo_4
0 DG_Meter96 NaN
1 NaN NaN
您可以使用:
df[[ 'UserId1', 'UserId2', 'UserId3', 'UserId4']] = df['UserId'].str.split(",", expand=True)