加入多个数据帧时 rsuffix 和 lsuffix 如何工作?

How does rsuffix and lsuffix work while joining multiple dataframes?

我写了下面的代码,但是我无法理解如何命名 rsuffix and lsuffix parameters

dfs_list = []
for cycle in email_df.cycle_end_date.unique():
    temp = email_df[email_df.cycle_end_date == cycle].transpose()\
                    .join(flash_df[flash_df.cycle_end_date == cycle].transpose(), how='outer', lsuffix='email', rsuffix='flash')\
                    .join(sms_df[sms_df.cycle_end_date == cycle].transpose(), how='outer', lsuffix='email', rsuffix='flash')\
                    .join(upi_df[upi_df.cycle_end_date==cycle].transpose(),how='outer',lsuffix='lsuf', rsuffix='rsuf')\
                    .join(ivr_df[ivr_df.cycle_end_date==cycle].transpose(),how='outer',lsuffix='lsuff', rsuffix='rsuff')
    dfs_list.append(temp)

我所有的 dfs 都有相同的列名

示例:

cycle_end_date | triggered | delivered | cost | payment_value | delivery%
2021-15-01  | 34 | 32 | 4 | 7899 | 5%
2021-31-01  | 45 | 49 | 8 | 1500 | 4%

当我打印 dfs_list[2].reset_index() 时,我确实得到了预期的输出,但我无法理解后缀名称。我们如何定义它?

输出:

**index |   2email |    1lsuff |    2flash |    2   | 1rsuff**
0   absolute_cost   3.00    9.40    9.40    0.00    6.00
1   bill_paid_percent   3.28    0.33    1.87    68139.72    0.28
2   bill_paid_using_reminder    21.20   0.70    9.45    1.78    0.64
3   bounced_email   5018    NaN NaN NaN NaN
4   clicked_email   13385   NaN NaN NaN NaN
5   cycle_end_date  2022-02-28  2022-02-28  2022-02-28  2022-02-28  2022-02-28

有人可以阐明如何命名后缀以获得所考虑数据帧的确切编号吗?

lsuffixrsuffix 仅在连接数据帧之间存在 重叠列 时有效。

让我们看看你脚本中的连续join

temp = email_df[email_df.cycle_end_date == cycle].transpose()\
         .join(flash_df[flash_df.cycle_end_date == cycle].transpose(), how='outer', lsuffix='email', rsuffix='flash')\
         .join(sms_df[sms_df.cycle_end_date == cycle].transpose(), how='outer', lsuffix='email', rsuffix='flash')\
         .join(upi_df[upi_df.cycle_end_date==cycle].transpose(),how='outer',lsuffix='lsuf', rsuffix='rsuf')\
         .join(ivr_df[ivr_df.cycle_end_date==cycle].transpose(),how='outer',lsuffix='lsuff', rsuffix='rsuff')

由于您对所有连接的数据框执行 transpose,所以实际上您使用的是列名是原始索引的数据框。

                         1
cycle_end_date  2021-31-01
triggered               45
delivered               49
cost                     8
payment_value         1500
delivery%               4%

加入email_dfflash_df后,1可能会重叠,所以加入的df是

                    1email       1flash
cycle_end_date  2021-15-01  2021-15-01
triggered               34          34
delivered               32          32
cost                     4           4
payment_value         7899        7899
delivery%               5%          5%

在下一次与 sms_df 的连接中,其列名为 index 与上面连接的 df 不重叠,因此输出可能像

                    0email      0flash           1
cycle_end_date  2021-15-01  2021-15-01  2021-15-01
triggered               34          34          34
delivered               32          32          32
cost                     4           4           4
payment_value         7899        7899        7899
delivery%               5%          5%          5%

这个过程还在继续...