在 python 中计算具有 2 个不同日期时间列的行

Question

我有一个包含 2 个日期列的数据框：

 ---------------------------- 
| date_created |  date_ended |
|--------------| ----------- |
|20/12/01      | 20/11/01    |
|20/12/01      | 20/12/02    |
|20/12/02      | 20/12/02    |
|20/12/02      | 20/12/03    |
|20/12/02      | 20/12/03    |
|20/12/03      | 20/12/03    |
|20/12/03      | 20/12/04    |
 ----------------------------

当两列值（日期）相同时，我需要计算两列的行数，即我需要的输出：

 ------------------------------------------
| date_index   |created_count| ended_count |
|--------------| ----------- | ----------- |
|20/11/01      |      0      |      1      |
|20/12/01      |      2      |      0      |
|20/12/02      |      3      |      2      |
|20/12/03      |      2      |      3      |
|20/12/04      |      0      |      1      |
 ------------------------------------------

我一直在逐列计数，然后与相同的日期索引匹配。有什么干净的方法可以实现这一目标吗？如果有人能帮忙。

Answer 1

你可以这样做：

res = pd.concat((df['date_created'].value_counts(),
                 df['date_ended'].value_counts()),
                  axis=1, sort=True).fillna(0).astype(int)
print(res)

输出

          date_created  date_ended
20/11/01             0           1
20/12/01             2           0
20/12/02             3           2
20/12/03             2           3
20/12/04             0           1

Answer 2

使用 DataFrame.apply with value_counts，将不匹配的 NaN 替换为 0，最后转换为整数：

df = df.apply(pd.value_counts).fillna(0).astype(int)
print (df)
         date_created  date_ended
20/11/01             0           1
20/12/01             2           0
20/12/02             3           2
20/12/03             2           3
20/12/04             0           1

如果要过滤列进行处理：

cols = ['date_created','date_ended']
df = df[cols].apply(pd.value_counts).fillna(0).astype(int)
print (df)

          date_created  date_ended
20/11/01             0           1
20/12/01             2           0
20/12/02             3           2
20/12/03             2           3
20/12/04             0           1

在 python 中计算具有 2 个不同日期时间列的行

Count rows with 2 different Date-time columns in python

python

datetime

dataframe

python-datetime

pandas