根据条件在另一个数据框列上按组计算缺失值的数量

Compute number of missing values by group on another dataframe column based on conditions

假设我有以下数据:

df=pd.DataFrame({"id":[1,1,1,2,2,3,4],
             "date":[2019,2019,2020,2020,2020,2020,2021],
             "subgroup":["con","ind","ind","con","ind","ind","ind"],
             "value":[1,None,2,None,1,3,4]})

我想按 ID 和 DATE 进行分组,并在这些重复项中获取一个列,该列根据子组列中的值计算值列中缺失值的数量(在这种情况下,当子组==时)工业”) 输出将如下所示:

id      date    subgroup   value  count
1       2019      con      1       1
1       2019      ind      None    1
1       2020      ind      2       0
2       2020      con      None    0
2       2020      ind      1       0
3       2020      ind      3       0
4       2021      ind      4       0

我怎样才能做到这一点?

df['counter'] = 0
df.loc[(df.subgroup=='ind') & (df.value.isna()), 'counter'] = 1
df['goal'] = df.groupby(["id","date"])['counter'].transform('sum') 
df = df.drop(columns='counter')

但正如 Alollz 指出的那样,您的示例代码不会生成您的示例数据框。

您需要找到子组 == 'ind' 所在的行,然后对值列中的 isnull() 求和:

new_df = df.loc[df['subgroup'] == 'ind']
lst = []
nans = new_df['value'][i].isnull().sum()