是否有 pandas 函数来转置数据框,为现有列的每个唯一值创建一个单独的列?
Is there a pandas function to transpose a data frame to create a separate column for each unique value of an existing column?
我想将我的数据框更改为可用于简单分析的格式。目前,我的数据框采用以下格式:
Carrier | Service | Weight | Area | Charge
A | GRND | 1 | 2 | .0
A | GRND | 2 | 2 | .0
A | GRND | 3 | 2 | .0
B | GRND | 1 | 2 | .5
B | GRND | 3 | 2 | .9
我想将我的数据转换成以下格式:
Service | Weight | Area | CarrierA_Charge | CarrierB_Charge
GRND | 1 | 2 | .0 | .5
GRND | 2 | 2 | .0 | NA
GRND | 3 | 2 | .0 | .9
最终,我的目标是创建一个列,为我提供针对服务、称重、区域的每种独特组合的最低收费承运人,如下所示:
Service | Weight | Area | CarrierA_Charge | CarrierB_Charge | min_charge |min_charge_carrier
GRND | 1 | 2 | .0 | .5 | .0 | A
GRND | 2 | 2 | .0 | NA | .0 | A
GRND | 3 | 2 | .0 | .9 | .9 | B
是否有内置的 pandas 函数可用于实现此目的,或者我如何在 python 中编写函数来实现此目的?
IIUC:
d = df.set_index(['Service', 'Weight', 'Area', 'Carrier']).Charge.unstack()
d.rename(columns=f'{d.columns.name}{{}}_Charge'.format) \
.reset_index().rename_axis(None, axis=1)
Service Weight Area CarrierA_Charge CarrierB_Charge
0 GRND 1 2 5.0 5.5
1 GRND 2 2 6.0 NaN
2 GRND 3 2 7.0 6.9
格式和附加列略有不同
d0 = df.set_index(['Service', 'Weight', 'Area', 'Carrier']).Charge.unstack()
d1 = pd.concat(dict(min_charge=d0.min(1), min_charge_carrier=d0.idxmin(1)), axis=1)
fmt = f'{d.columns.name}{{}}_Charge'.format
d0.rename(columns=fmt).join(d1).reset_index().rename_axis(None, axis=1)
Service Weight Area NoneA_Charge NoneB_Charge min_charge min_charge_carrier
0 GRND 1 2 5.0 5.5 5.0 A
1 GRND 2 2 6.0 NaN 6.0 A
2 GRND 3 2 7.0 6.9 6.9 B
转向table接近
# pivot table
pivot = df.pivot_table(columns = 'Carrier', index=['Service', 'Weight', 'Area'], values='Charge',
aggfunc = np.min).reset_index()
# rename columns here
要完整回答您的问题,包括额外的栏目:
首先我们创建数据透视表并相应地重命名列:
第 1 步:旋转并重命名
pivot = df.pivot_table(index=['Service', 'Weight', 'Area'],
columns='Carrier',
values='Charge',
aggfunc=lambda x: ' '.join(x))
pivot.columns = [pivot.columns.name + col + '_Charge' for col in pivot.columns]
pivot.reset_index(inplace=True)
Service Weight Area CarrierA_Charge CarrierB_Charge
0 GRND 1 2 .0 .5
1 GRND 2 2 .0 NaN
2 GRND 3 2 .0 .9
第 2 步创建额外的列:
cols = ['CarrierA_Charge', 'CarrierB_Charge']
for col in cols:
pivot[col] = pivot[col].str.replace('$', '').astype(float)
pivot['min_charge'] = pivot[['CarrierA_Charge', 'CarrierB_Charge']].min(axis=1)
pivot['min_charge_carrier'] = np.where(pivot['min_charge'].eq(pivot['CarrierA_Charge']),
'A', 'B')
Service Weight Area CarrierA_Charge CarrierB_Charge min_charge min_charge_carrier
0 GRND 1 2 5.0 5.5 5.0 A
1 GRND 2 2 6.0 NaN 6.0 A
2 GRND 3 2 7.0 6.9 6.9 B
我想将我的数据框更改为可用于简单分析的格式。目前,我的数据框采用以下格式:
Carrier | Service | Weight | Area | Charge
A | GRND | 1 | 2 | .0
A | GRND | 2 | 2 | .0
A | GRND | 3 | 2 | .0
B | GRND | 1 | 2 | .5
B | GRND | 3 | 2 | .9
我想将我的数据转换成以下格式:
Service | Weight | Area | CarrierA_Charge | CarrierB_Charge
GRND | 1 | 2 | .0 | .5
GRND | 2 | 2 | .0 | NA
GRND | 3 | 2 | .0 | .9
最终,我的目标是创建一个列,为我提供针对服务、称重、区域的每种独特组合的最低收费承运人,如下所示:
Service | Weight | Area | CarrierA_Charge | CarrierB_Charge | min_charge |min_charge_carrier
GRND | 1 | 2 | .0 | .5 | .0 | A
GRND | 2 | 2 | .0 | NA | .0 | A
GRND | 3 | 2 | .0 | .9 | .9 | B
是否有内置的 pandas 函数可用于实现此目的,或者我如何在 python 中编写函数来实现此目的?
IIUC:
d = df.set_index(['Service', 'Weight', 'Area', 'Carrier']).Charge.unstack()
d.rename(columns=f'{d.columns.name}{{}}_Charge'.format) \
.reset_index().rename_axis(None, axis=1)
Service Weight Area CarrierA_Charge CarrierB_Charge
0 GRND 1 2 5.0 5.5
1 GRND 2 2 6.0 NaN
2 GRND 3 2 7.0 6.9
格式和附加列略有不同
d0 = df.set_index(['Service', 'Weight', 'Area', 'Carrier']).Charge.unstack()
d1 = pd.concat(dict(min_charge=d0.min(1), min_charge_carrier=d0.idxmin(1)), axis=1)
fmt = f'{d.columns.name}{{}}_Charge'.format
d0.rename(columns=fmt).join(d1).reset_index().rename_axis(None, axis=1)
Service Weight Area NoneA_Charge NoneB_Charge min_charge min_charge_carrier
0 GRND 1 2 5.0 5.5 5.0 A
1 GRND 2 2 6.0 NaN 6.0 A
2 GRND 3 2 7.0 6.9 6.9 B
转向table接近
# pivot table
pivot = df.pivot_table(columns = 'Carrier', index=['Service', 'Weight', 'Area'], values='Charge',
aggfunc = np.min).reset_index()
# rename columns here
要完整回答您的问题,包括额外的栏目:
首先我们创建数据透视表并相应地重命名列:
第 1 步:旋转并重命名
pivot = df.pivot_table(index=['Service', 'Weight', 'Area'],
columns='Carrier',
values='Charge',
aggfunc=lambda x: ' '.join(x))
pivot.columns = [pivot.columns.name + col + '_Charge' for col in pivot.columns]
pivot.reset_index(inplace=True)
Service Weight Area CarrierA_Charge CarrierB_Charge
0 GRND 1 2 .0 .5
1 GRND 2 2 .0 NaN
2 GRND 3 2 .0 .9
第 2 步创建额外的列:
cols = ['CarrierA_Charge', 'CarrierB_Charge']
for col in cols:
pivot[col] = pivot[col].str.replace('$', '').astype(float)
pivot['min_charge'] = pivot[['CarrierA_Charge', 'CarrierB_Charge']].min(axis=1)
pivot['min_charge_carrier'] = np.where(pivot['min_charge'].eq(pivot['CarrierA_Charge']),
'A', 'B')
Service Weight Area CarrierA_Charge CarrierB_Charge min_charge min_charge_carrier
0 GRND 1 2 5.0 5.5 5.0 A
1 GRND 2 2 6.0 NaN 6.0 A
2 GRND 3 2 7.0 6.9 6.9 B