当我分组并用分组元素的最小日期值填充 na 时发出

Question

这是我的数据集，我正在尝试用最小值填充具有 NaN 的日期列。

我正在尝试用 ag_id（按元素分组）的日期列中的最小值填充日期列中的 NaN。当我执行以下操作时，我得到了意外的输出。

df_test_revenue_1["1st_rev_month"] = df_test_revenue_1.groupby("ag_id").transform(lambda x: x.fillna(x.min()))

执行上述操作的意外输出：

我原以为 1st_rev_month 列的所有值都是 2017-10-01。相反，它看起来像是从 revenue_month

列中选择 1st_rev_month 的值

我的最终目标是获得此结果（对其余日期列应用相同的逻辑（revenue_month 列除外

Answer 1

您的代码：

df_test_revenue_1.groupby("ag_id").transform(lambda x: x.fillna(x.min()))

从不引用列“1st_rev_month”。您在左侧引用它，意思是“将结果放入 1st_rev_month”列。但结果是什么？它是所有数据帧的最小值，而不仅仅是“1st_rev_month”。

您只需要在“1st_rev_month”列中取最小值。

修复：

df_test_revenue_1["1st_rev_month"]=df_test_revenue_1.groupby("ag_id")['1st_rev_month'].transform(lambda x: x.fillna(x.min()))

在 groupby 之后添加 ['1st_rev_month'] 以仅获取相关列

Issue when I groupby & fill na with the min date value by the grouped element