如何强制 Pandas 应用于 return 父数据框的所有列?
How to force Pandas apply to return all columns of parent dataframe?
在数据框的某些列上使用 groupby 之后,随后使用 apply 测试字符串是否存在于另一列中,pandas 仅 return 分组依据的列和最后一列使用应用程序创建。是否可以 return 与 groupby 关联的所有列并进行测试?例如,按对话线程的唯一标识符分组,并测试字符串是否存在于另一列中,然后包括数据框中存在但属于特定组的其他一些列?
我试过使用 groupby,然后使用申请匿名函数。
df.head()
shipment_id shipper_id courier_id Question sender
0 14 9962 228898 Let's get your furbabys home Apple pet transpo... courier
1 91919 190872 196838 Hi I'm kevin thims and I'm happy to do the job... courier
2 92187 191128 196838 Hi I'm kevin thims and I'm happy to do the job... shipper
unique_thread_indentifier = ['shipment_id', 'shipper_id', 'courier_id']
required_variables = ['shipment_id', 'shipper_id', 'courier_id', 'Question', 'sender']
df_new = (
df
.groupby(unique_thread_indentifier)[required_variables]
.apply(lambda group: 'shipper' in group['sender'].unique())
.to_frame(name='shipper_replied')
.reset_index()
)
df_new.head()
shipment_id shipper_id courier_id shipper_replied
0 14 9962 228898 False
1 91919 190872 196838 False
2 92187 191128 196838 True
我的目标是将列 Question
和 sender
包含在最终数据框中。预期输出如下所示:
shipment_id shipper_id courier_id Question sender shipper_replied
0 14 9962 228898 Let's get your furbabys home Apple pet transpo... courier False
1 91919 190872 196838 Hi I'm kevin thims and I'm happy to do the job... courier False
2 92187 191128 196838 Hi I'm kevin thims and I'm happy to do the job... shipper True
我相信你需要GroupBy.transform
:
df['shipper_replied'] = (df.groupby(unique_thread_indentifier)['sender']
.transform(lambda group: 'shipper' in group.unique()))
print (df)
shipment_id shipper_id courier_id \
0 14 9962 228898
1 91919 190872 196838
2 92187 191128 196838
Question sender shipper_replied
0 Let's get your furbabys home Apple pet transpo. courier False
1 Hi I'm kevin thims and I'm happy to do the job courier False
2 Hi I'm kevin thims and I'm happy to do the job shipper True
另一个解决方案:
df['shipper_replied'] = (df.assign(new = df['sender'].eq('shipper'))
.groupby(unique_thread_indentifier)['new']
.transform('any'))
print (df)
shipment_id shipper_id courier_id \
0 14 9962 228898
1 91919 190872 196838
2 92187 191128 196838
Question sender shipper_replied
0 Let's get your furbabys home Apple pet transpo. courier False
1 Hi I'm kevin thims and I'm happy to do the job courier False
2 Hi I'm kevin thims and I'm happy to do the job shipper True
在数据框的某些列上使用 groupby 之后,随后使用 apply 测试字符串是否存在于另一列中,pandas 仅 return 分组依据的列和最后一列使用应用程序创建。是否可以 return 与 groupby 关联的所有列并进行测试?例如,按对话线程的唯一标识符分组,并测试字符串是否存在于另一列中,然后包括数据框中存在但属于特定组的其他一些列?
我试过使用 groupby,然后使用申请匿名函数。
df.head()
shipment_id shipper_id courier_id Question sender
0 14 9962 228898 Let's get your furbabys home Apple pet transpo... courier
1 91919 190872 196838 Hi I'm kevin thims and I'm happy to do the job... courier
2 92187 191128 196838 Hi I'm kevin thims and I'm happy to do the job... shipper
unique_thread_indentifier = ['shipment_id', 'shipper_id', 'courier_id']
required_variables = ['shipment_id', 'shipper_id', 'courier_id', 'Question', 'sender']
df_new = (
df
.groupby(unique_thread_indentifier)[required_variables]
.apply(lambda group: 'shipper' in group['sender'].unique())
.to_frame(name='shipper_replied')
.reset_index()
)
df_new.head()
shipment_id shipper_id courier_id shipper_replied
0 14 9962 228898 False
1 91919 190872 196838 False
2 92187 191128 196838 True
我的目标是将列 Question
和 sender
包含在最终数据框中。预期输出如下所示:
shipment_id shipper_id courier_id Question sender shipper_replied
0 14 9962 228898 Let's get your furbabys home Apple pet transpo... courier False
1 91919 190872 196838 Hi I'm kevin thims and I'm happy to do the job... courier False
2 92187 191128 196838 Hi I'm kevin thims and I'm happy to do the job... shipper True
我相信你需要GroupBy.transform
:
df['shipper_replied'] = (df.groupby(unique_thread_indentifier)['sender']
.transform(lambda group: 'shipper' in group.unique()))
print (df)
shipment_id shipper_id courier_id \
0 14 9962 228898
1 91919 190872 196838
2 92187 191128 196838
Question sender shipper_replied
0 Let's get your furbabys home Apple pet transpo. courier False
1 Hi I'm kevin thims and I'm happy to do the job courier False
2 Hi I'm kevin thims and I'm happy to do the job shipper True
另一个解决方案:
df['shipper_replied'] = (df.assign(new = df['sender'].eq('shipper'))
.groupby(unique_thread_indentifier)['new']
.transform('any'))
print (df)
shipment_id shipper_id courier_id \
0 14 9962 228898
1 91919 190872 196838
2 92187 191128 196838
Question sender shipper_replied
0 Let's get your furbabys home Apple pet transpo. courier False
1 Hi I'm kevin thims and I'm happy to do the job courier False
2 Hi I'm kevin thims and I'm happy to do the job shipper True