加入多个数据帧时 rsuffix 和 lsuffix 如何工作?
How does rsuffix and lsuffix work while joining multiple dataframes?
我写了下面的代码,但是我无法理解如何命名 rsuffix and lsuffix parameters
dfs_list = []
for cycle in email_df.cycle_end_date.unique():
temp = email_df[email_df.cycle_end_date == cycle].transpose()\
.join(flash_df[flash_df.cycle_end_date == cycle].transpose(), how='outer', lsuffix='email', rsuffix='flash')\
.join(sms_df[sms_df.cycle_end_date == cycle].transpose(), how='outer', lsuffix='email', rsuffix='flash')\
.join(upi_df[upi_df.cycle_end_date==cycle].transpose(),how='outer',lsuffix='lsuf', rsuffix='rsuf')\
.join(ivr_df[ivr_df.cycle_end_date==cycle].transpose(),how='outer',lsuffix='lsuff', rsuffix='rsuff')
dfs_list.append(temp)
我所有的 dfs 都有相同的列名
示例:
cycle_end_date | triggered | delivered | cost | payment_value | delivery%
2021-15-01 | 34 | 32 | 4 | 7899 | 5%
2021-31-01 | 45 | 49 | 8 | 1500 | 4%
当我打印 dfs_list[2].reset_index()
时,我确实得到了预期的输出,但我无法理解后缀名称。我们如何定义它?
输出:
**index | 2email | 1lsuff | 2flash | 2 | 1rsuff**
0 absolute_cost 3.00 9.40 9.40 0.00 6.00
1 bill_paid_percent 3.28 0.33 1.87 68139.72 0.28
2 bill_paid_using_reminder 21.20 0.70 9.45 1.78 0.64
3 bounced_email 5018 NaN NaN NaN NaN
4 clicked_email 13385 NaN NaN NaN NaN
5 cycle_end_date 2022-02-28 2022-02-28 2022-02-28 2022-02-28 2022-02-28
有人可以阐明如何命名后缀以获得所考虑数据帧的确切编号吗?
lsuffix
和 rsuffix
仅在连接数据帧之间存在 重叠列 时有效。
让我们看看你脚本中的连续join
temp = email_df[email_df.cycle_end_date == cycle].transpose()\
.join(flash_df[flash_df.cycle_end_date == cycle].transpose(), how='outer', lsuffix='email', rsuffix='flash')\
.join(sms_df[sms_df.cycle_end_date == cycle].transpose(), how='outer', lsuffix='email', rsuffix='flash')\
.join(upi_df[upi_df.cycle_end_date==cycle].transpose(),how='outer',lsuffix='lsuf', rsuffix='rsuf')\
.join(ivr_df[ivr_df.cycle_end_date==cycle].transpose(),how='outer',lsuffix='lsuff', rsuffix='rsuff')
由于您对所有连接的数据框执行 transpose
,所以实际上您使用的是列名是原始索引的数据框。
1
cycle_end_date 2021-31-01
triggered 45
delivered 49
cost 8
payment_value 1500
delivery% 4%
加入email_df
和flash_df
后,1
可能会重叠,所以加入的df是
1email 1flash
cycle_end_date 2021-15-01 2021-15-01
triggered 34 34
delivered 32 32
cost 4 4
payment_value 7899 7899
delivery% 5% 5%
在下一次与 sms_df
的连接中,其列名为 index 与上面连接的 df 不重叠,因此输出可能像
0email 0flash 1
cycle_end_date 2021-15-01 2021-15-01 2021-15-01
triggered 34 34 34
delivered 32 32 32
cost 4 4 4
payment_value 7899 7899 7899
delivery% 5% 5% 5%
这个过程还在继续...
我写了下面的代码,但是我无法理解如何命名 rsuffix and lsuffix parameters
dfs_list = []
for cycle in email_df.cycle_end_date.unique():
temp = email_df[email_df.cycle_end_date == cycle].transpose()\
.join(flash_df[flash_df.cycle_end_date == cycle].transpose(), how='outer', lsuffix='email', rsuffix='flash')\
.join(sms_df[sms_df.cycle_end_date == cycle].transpose(), how='outer', lsuffix='email', rsuffix='flash')\
.join(upi_df[upi_df.cycle_end_date==cycle].transpose(),how='outer',lsuffix='lsuf', rsuffix='rsuf')\
.join(ivr_df[ivr_df.cycle_end_date==cycle].transpose(),how='outer',lsuffix='lsuff', rsuffix='rsuff')
dfs_list.append(temp)
我所有的 dfs 都有相同的列名
示例:
cycle_end_date | triggered | delivered | cost | payment_value | delivery%
2021-15-01 | 34 | 32 | 4 | 7899 | 5%
2021-31-01 | 45 | 49 | 8 | 1500 | 4%
当我打印 dfs_list[2].reset_index()
时,我确实得到了预期的输出,但我无法理解后缀名称。我们如何定义它?
输出:
**index | 2email | 1lsuff | 2flash | 2 | 1rsuff**
0 absolute_cost 3.00 9.40 9.40 0.00 6.00
1 bill_paid_percent 3.28 0.33 1.87 68139.72 0.28
2 bill_paid_using_reminder 21.20 0.70 9.45 1.78 0.64
3 bounced_email 5018 NaN NaN NaN NaN
4 clicked_email 13385 NaN NaN NaN NaN
5 cycle_end_date 2022-02-28 2022-02-28 2022-02-28 2022-02-28 2022-02-28
有人可以阐明如何命名后缀以获得所考虑数据帧的确切编号吗?
lsuffix
和 rsuffix
仅在连接数据帧之间存在 重叠列 时有效。
让我们看看你脚本中的连续join
temp = email_df[email_df.cycle_end_date == cycle].transpose()\
.join(flash_df[flash_df.cycle_end_date == cycle].transpose(), how='outer', lsuffix='email', rsuffix='flash')\
.join(sms_df[sms_df.cycle_end_date == cycle].transpose(), how='outer', lsuffix='email', rsuffix='flash')\
.join(upi_df[upi_df.cycle_end_date==cycle].transpose(),how='outer',lsuffix='lsuf', rsuffix='rsuf')\
.join(ivr_df[ivr_df.cycle_end_date==cycle].transpose(),how='outer',lsuffix='lsuff', rsuffix='rsuff')
由于您对所有连接的数据框执行 transpose
,所以实际上您使用的是列名是原始索引的数据框。
1
cycle_end_date 2021-31-01
triggered 45
delivered 49
cost 8
payment_value 1500
delivery% 4%
加入email_df
和flash_df
后,1
可能会重叠,所以加入的df是
1email 1flash
cycle_end_date 2021-15-01 2021-15-01
triggered 34 34
delivered 32 32
cost 4 4
payment_value 7899 7899
delivery% 5% 5%
在下一次与 sms_df
的连接中,其列名为 index 与上面连接的 df 不重叠,因此输出可能像
0email 0flash 1
cycle_end_date 2021-15-01 2021-15-01 2021-15-01
triggered 34 34 34
delivered 32 32 32
cost 4 4 4
payment_value 7899 7899 7899
delivery% 5% 5% 5%
这个过程还在继续...