计算两个日期字段之间的天数差异

Calculate the difference in days between two date fields

我有问题。我有两个日期字段 fromDatetoDatetoDate 还包含一个时间戳,例如2021-03-22T18:59:59Z。 问题是我想计算这两个值之间的天数差。 toDate - fromDate = 天数差异。 但是,当我这样做时,出现以下错误 [OUT] TypeError: unsupported operand type(s) for -: 'Timestamp' and 'datetime.date'。我在没有时间戳的情况下转换了字段 toDate。值得一提的是,这两个字段可以包含空值。

如何计算两个日期字段之间的天数差异?

数据框

    id  toDate                  fromDate
0   1   2021-03-22T18:59:59Z    2021-02-22
1   2   None                    2021-03-18
2   3   2021-04-22T18:59:59Z    2021-03-22
3   4   2021-02-15T18:59:59Z    2021-02-10
4   5   2021-09-15T18:59:59Z    2021-09-07
5   6   2020-01-12T18:59:59Z    None
6   7   2022-02-22T18:59:59Z    2022-01-18

代码

import pandas as pd
d = {'id': [1, 2, 3, 4, 5, 6, 7],
     'toDate': ['2021-03-22T18:59:59Z', None, '2021-04-22T18:59:59Z', 
'2021-02-15T18:59:59Z', '2021-09-15T18:59:59Z', '2020-01-12T18:59:59Z', '2022-02-22T18:59:59Z'],
     'fromDate': ['2021-02-22', '2021-03-18', '2021-03-22', 
'2021-02-10', '2021-09-07', None, '2022-01-18']
    }
df = pd.DataFrame(data=d)
display(df)
df['toDate']  = pd.to_datetime(df['toDate'], errors='coerce').dt.date
df['fromDate']  = pd.to_datetime(df['fromDate'], errors='coerce')
display(df)

#df['days']  = df['fromDate'].subtract(df['toDate'])
df['days'] = (df['fromDate'] - df['toDate']).dt.days

[OUT] TypeError: unsupported operand type(s) for -: 'Timestamp' and 'datetime.date'

我想要的

id  toDate                   fromDate     days
0   1   2021-03-22           2021-02-22   30
1   2   NaT                  2021-03-18   NaT
2   3   2021-04-22           2021-03-22   30
3   4   2021-02-15           2021-02-10    5
4   5   2021-09-15           2021-09-07    8
5   6   2020-01-12           NaT          NaT
6   7   2022-02-22           2022-01-18   34

要在 toDate 列中减去需要的日期时间,因此要将时间设置为 00:00:00 使用 Series.dt.normalize:

df['toDate']  = pd.to_datetime(df['toDate'], errors='coerce').dt.normalize()

Series.dt.floor:

df['toDate']  = pd.to_datetime(df['toDate'], errors='coerce').dt.floor('D')

另一个想法是将两列都转换为日期,在更旧的 pandas 版本中应该会失败:

df['toDate']  = pd.to_datetime(df['toDate'], errors='coerce').dt.date
df['fromDate']  = pd.to_datetime(df['fromDate'], errors='coerce').dt.date

df['days'] = (df['toDate'] - df['fromDate']).dt.days
print (df)
   id      toDate    fromDate  days
0   1  2021-03-22  2021-02-22  28.0
1   2         NaT  2021-03-18   NaN
2   3  2021-04-22  2021-03-22  31.0
3   4  2021-02-15  2021-02-10   5.0
4   5  2021-09-15  2021-09-07   8.0
5   6  2020-01-12         NaT   NaN
6   7  2022-02-22  2022-01-18  35.0