如何分组并计算 pandas 组中每列缺失值的数量 none

Question

我有以下数据集

user_id  var  qualified_date    loyal_date
  1       1      2017-01-17     2017-02-03
  2       1      2017-01-03     2017-01-13
  3       1      2017-01-11      NaT
  4       1       NaT            NaT
  5       1       NaT            NaT
  6       2      2017-01-15      2017-02-14
  7       2      2017-01-07      NaT
  8       2      2017-01-23      2017-02-18
  9       2      2017-01-25      NaT
  10      2      2017-01-11      2017-03-01

我需要按 'Var' 中的值对这个数据框进行分组，然后计算 'qualified_date' 和 'engaged_date' 列中每个列的非缺失值的数量。我可以分别为每一列做这件事，然后手动将它们放在数据框中，但我正在寻找一种 groupby 方法或类似的方法，我可以自动进入一个新的 DF，而不是 'var' 中的值作为索引和 for显示每个组的非缺失值计数的两列。

像这样

var  qualified_count loyal_count
 1       xx            xx
 2       xx            xx

Answer 1

您可以使用 DF.GroupBy.count，它在计数时仅包含非 NaN 条目。因此，您可以让 var 成为分组键，然后分别聚合 DF 的两个选定列的计数，如下所示：

cols = ['qualified_date', 'loyal_date']
df.groupby('var')[cols].agg('count').add_suffix("_count").reset_index()

如何分组并计算 pandas 组中每列缺失值的数量 none

How to group by and count number of none missing values for each column in group in pandas

python

dataframe

python-3.x

pandas

pandasql