如何在 lambda 函数中使用 apply after groupby() 在两个条件下创建数据框？

Question

我正在尝试根据变量 'scope' 在数据框中创建投资组合，在数据框中的第一个投资组合中留下范围值最高 33% 的行，在第二个和中间的 34% 中每个时间段和行业的第三个倒数 33%。

到目前为止，我按日期和行业对数据进行了分组

group_first = data_clean.groupby(['date','industry'])

然后使用 lambda 函数获取每个日期和行业的 'scope' 第一个三分位数的行；例如：

port = group_first.apply(lambda x: x[x['scope'] <= x.scope.quantile(0.33)]).reset_index(drop=True)

这适用于第一个和第三个三分位数，但不适用于中间一个，因为我得到

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

在 lambda 函数中放置两个条件，如下所示：

group_middle = data_clean.groupby(['date','industry'])
port_middle = group_middle.apply(lambda x: (x[x['scope'] > x.scope.quantile(0.67)]) and (x[x['scope'] < x.scope.quantile(0.33)])).reset_index(drop=True)

换句话说，如何在日期和行业分组后获取包含 'scope' 中第 33 和第 67 个百分位数之间的值的数据框的行？

知道如何解决这个问题吗？

Answer 1

我猜 - 我没有数据来测试它。

你使用了错误的 < 和 > 你检查 scope<33 and scope>67 得到 0...33 and 67...100 （它可能会给出空数据）但是你需要 scope>33 and scope<67得到 33..67

您也可以使用 x[ scope>33 & scope<67 ] 代替 x[scope>33] and x[scope<67]

port_middle = group_middle.apply(lambda x: 
   x[ 
      (x['scope'] > x.scope.quantile(0.33)) & (x['scope'] < x.scope.quantile(0.67)
   ]
).reset_index(drop=True)

如何在 lambda 函数中使用 apply after groupby() 在两个条件下创建数据框？

How to create a dataframe on two conditions in a lambda function using apply after groupby()?

python

lambda

apply

quantile

pandas-groupby