是否有 pandas 函数来转置数据框,为现有列的每个唯一值创建一个单独的列?

Is there a pandas function to transpose a data frame to create a separate column for each unique value of an existing column?

我想将我的数据框更改为可用于简单分析的格式。目前,我的数据框采用以下格式:

 Carrier | Service | Weight | Area | Charge
   A     |   GRND  |  1     |  2   | .0
   A     |   GRND  |  2     |  2   | .0
   A     |   GRND  |  3     |  2   | .0
   B     |   GRND  |  1     |  2   | .5
   B     |   GRND  |  3     |  2   | .9

我想将我的数据转换成以下格式:

  Service | Weight | Area | CarrierA_Charge | CarrierB_Charge
   GRND   |  1     |  2   |      .0       |   .5
   GRND   |  2     |  2   |      .0       |   NA
   GRND   |  3     |  2   |      .0       |   .9

最终,我的目标是创建一个列,为我提供针对服务、称重、区域的每种独特组合的最低收费承运人,如下所示:

  Service | Weight | Area | CarrierA_Charge | CarrierB_Charge | min_charge |min_charge_carrier
   GRND   |  1     |  2   |      .0       |   .5          |  .0      |   A
   GRND   |  2     |  2   |      .0       |   NA            |  .0      |   A
   GRND   |  3     |  2   |      .0       |   .9          |  .9      |   B

是否有内置的 pandas 函数可用于实现此目的,或者我如何在 python 中编写函数来实现此目的?

IIUC:

d = df.set_index(['Service', 'Weight', 'Area', 'Carrier']).Charge.unstack()
d.rename(columns=f'{d.columns.name}{{}}_Charge'.format) \
 .reset_index().rename_axis(None, axis=1)

  Service  Weight  Area  CarrierA_Charge  CarrierB_Charge
0    GRND       1     2              5.0              5.5
1    GRND       2     2              6.0              NaN
2    GRND       3     2              7.0              6.9

格式和附加列略有不同

d0 = df.set_index(['Service', 'Weight', 'Area', 'Carrier']).Charge.unstack()
d1 = pd.concat(dict(min_charge=d0.min(1), min_charge_carrier=d0.idxmin(1)), axis=1)
fmt = f'{d.columns.name}{{}}_Charge'.format

d0.rename(columns=fmt).join(d1).reset_index().rename_axis(None, axis=1)

  Service  Weight  Area  NoneA_Charge  NoneB_Charge  min_charge min_charge_carrier
0    GRND       1     2           5.0           5.5         5.0                  A
1    GRND       2     2           6.0           NaN         6.0                  A
2    GRND       3     2           7.0           6.9         6.9                  B

转向table接近

# pivot table
pivot = df.pivot_table(columns = 'Carrier', index=['Service', 'Weight', 'Area'], values='Charge',
                       aggfunc = np.min).reset_index()

# rename columns here

要完整回答您的问题,包括额外的栏目:

首先我们创建数据透视表并相应地重命名列:

第 1 步:旋转并重命名

pivot = df.pivot_table(index=['Service', 'Weight', 'Area'], 
                       columns='Carrier', 
                       values='Charge', 
                       aggfunc=lambda x: ' '.join(x))

pivot.columns = [pivot.columns.name + col + '_Charge' for col in pivot.columns]
pivot.reset_index(inplace=True)
  Service  Weight  Area CarrierA_Charge CarrierB_Charge
0    GRND       1     2            .0            .5
1    GRND       2     2            .0             NaN
2    GRND       3     2            .0            .9

第 2 步创建额外的列:

cols = ['CarrierA_Charge', 'CarrierB_Charge']

for col in cols:
    pivot[col] = pivot[col].str.replace('$', '').astype(float)

pivot['min_charge'] = pivot[['CarrierA_Charge', 'CarrierB_Charge']].min(axis=1)

pivot['min_charge_carrier'] = np.where(pivot['min_charge'].eq(pivot['CarrierA_Charge']), 
                                       'A', 'B')

  Service  Weight  Area  CarrierA_Charge  CarrierB_Charge  min_charge min_charge_carrier
0    GRND       1     2              5.0              5.5         5.0                  A
1    GRND       2     2              6.0              NaN         6.0                  A
2    GRND       3     2              7.0              6.9         6.9                  B