根据条件在另一个数据框列上按组计算缺失值的数量
Compute number of missing values by group on another dataframe column based on conditions
假设我有以下数据:
df=pd.DataFrame({"id":[1,1,1,2,2,3,4],
"date":[2019,2019,2020,2020,2020,2020,2021],
"subgroup":["con","ind","ind","con","ind","ind","ind"],
"value":[1,None,2,None,1,3,4]})
我想按 ID 和 DATE 进行分组,并在这些重复项中获取一个列,该列根据子组列中的值计算值列中缺失值的数量(在这种情况下,当子组==时)工业”)
输出将如下所示:
id date subgroup value count
1 2019 con 1 1
1 2019 ind None 1
1 2020 ind 2 0
2 2020 con None 0
2 2020 ind 1 0
3 2020 ind 3 0
4 2021 ind 4 0
我怎样才能做到这一点?
df['counter'] = 0
df.loc[(df.subgroup=='ind') & (df.value.isna()), 'counter'] = 1
df['goal'] = df.groupby(["id","date"])['counter'].transform('sum')
df = df.drop(columns='counter')
但正如 Alollz 指出的那样,您的示例代码不会生成您的示例数据框。
您需要找到子组 == 'ind' 所在的行,然后对值列中的 isnull() 求和:
new_df = df.loc[df['subgroup'] == 'ind']
lst = []
nans = new_df['value'][i].isnull().sum()
假设我有以下数据:
df=pd.DataFrame({"id":[1,1,1,2,2,3,4],
"date":[2019,2019,2020,2020,2020,2020,2021],
"subgroup":["con","ind","ind","con","ind","ind","ind"],
"value":[1,None,2,None,1,3,4]})
我想按 ID 和 DATE 进行分组,并在这些重复项中获取一个列,该列根据子组列中的值计算值列中缺失值的数量(在这种情况下,当子组==时)工业”) 输出将如下所示:
id date subgroup value count
1 2019 con 1 1
1 2019 ind None 1
1 2020 ind 2 0
2 2020 con None 0
2 2020 ind 1 0
3 2020 ind 3 0
4 2021 ind 4 0
我怎样才能做到这一点?
df['counter'] = 0
df.loc[(df.subgroup=='ind') & (df.value.isna()), 'counter'] = 1
df['goal'] = df.groupby(["id","date"])['counter'].transform('sum')
df = df.drop(columns='counter')
但正如 Alollz 指出的那样,您的示例代码不会生成您的示例数据框。
您需要找到子组 == 'ind' 所在的行,然后对值列中的 isnull() 求和:
new_df = df.loc[df['subgroup'] == 'ind']
lst = []
nans = new_df['value'][i].isnull().sum()