计算两个日期字段之间的天数差异
Calculate the difference in days between two date fields
我有问题。我有两个日期字段 fromDate
和 toDate
。 toDate
还包含一个时间戳,例如2021-03-22T18:59:59Z
。
问题是我想计算这两个值之间的天数差。 toDate
- fromDate
= 天数差异。
但是,当我这样做时,出现以下错误 [OUT] TypeError: unsupported operand type(s) for -: 'Timestamp' and 'datetime.date'
。我在没有时间戳的情况下转换了字段 toDate
。值得一提的是,这两个字段可以包含空值。
如何计算两个日期字段之间的天数差异?
数据框
id toDate fromDate
0 1 2021-03-22T18:59:59Z 2021-02-22
1 2 None 2021-03-18
2 3 2021-04-22T18:59:59Z 2021-03-22
3 4 2021-02-15T18:59:59Z 2021-02-10
4 5 2021-09-15T18:59:59Z 2021-09-07
5 6 2020-01-12T18:59:59Z None
6 7 2022-02-22T18:59:59Z 2022-01-18
代码
import pandas as pd
d = {'id': [1, 2, 3, 4, 5, 6, 7],
'toDate': ['2021-03-22T18:59:59Z', None, '2021-04-22T18:59:59Z',
'2021-02-15T18:59:59Z', '2021-09-15T18:59:59Z', '2020-01-12T18:59:59Z', '2022-02-22T18:59:59Z'],
'fromDate': ['2021-02-22', '2021-03-18', '2021-03-22',
'2021-02-10', '2021-09-07', None, '2022-01-18']
}
df = pd.DataFrame(data=d)
display(df)
df['toDate'] = pd.to_datetime(df['toDate'], errors='coerce').dt.date
df['fromDate'] = pd.to_datetime(df['fromDate'], errors='coerce')
display(df)
#df['days'] = df['fromDate'].subtract(df['toDate'])
df['days'] = (df['fromDate'] - df['toDate']).dt.days
[OUT] TypeError: unsupported operand type(s) for -: 'Timestamp' and 'datetime.date'
我想要的
id toDate fromDate days
0 1 2021-03-22 2021-02-22 30
1 2 NaT 2021-03-18 NaT
2 3 2021-04-22 2021-03-22 30
3 4 2021-02-15 2021-02-10 5
4 5 2021-09-15 2021-09-07 8
5 6 2020-01-12 NaT NaT
6 7 2022-02-22 2022-01-18 34
要在 toDate
列中减去需要的日期时间,因此要将时间设置为 00:00:00
使用 Series.dt.normalize
:
df['toDate'] = pd.to_datetime(df['toDate'], errors='coerce').dt.normalize()
df['toDate'] = pd.to_datetime(df['toDate'], errors='coerce').dt.floor('D')
另一个想法是将两列都转换为日期,在更旧的 pandas 版本中应该会失败:
df['toDate'] = pd.to_datetime(df['toDate'], errors='coerce').dt.date
df['fromDate'] = pd.to_datetime(df['fromDate'], errors='coerce').dt.date
df['days'] = (df['toDate'] - df['fromDate']).dt.days
print (df)
id toDate fromDate days
0 1 2021-03-22 2021-02-22 28.0
1 2 NaT 2021-03-18 NaN
2 3 2021-04-22 2021-03-22 31.0
3 4 2021-02-15 2021-02-10 5.0
4 5 2021-09-15 2021-09-07 8.0
5 6 2020-01-12 NaT NaN
6 7 2022-02-22 2022-01-18 35.0
我有问题。我有两个日期字段 fromDate
和 toDate
。 toDate
还包含一个时间戳,例如2021-03-22T18:59:59Z
。
问题是我想计算这两个值之间的天数差。 toDate
- fromDate
= 天数差异。
但是,当我这样做时,出现以下错误 [OUT] TypeError: unsupported operand type(s) for -: 'Timestamp' and 'datetime.date'
。我在没有时间戳的情况下转换了字段 toDate
。值得一提的是,这两个字段可以包含空值。
如何计算两个日期字段之间的天数差异?
数据框
id toDate fromDate
0 1 2021-03-22T18:59:59Z 2021-02-22
1 2 None 2021-03-18
2 3 2021-04-22T18:59:59Z 2021-03-22
3 4 2021-02-15T18:59:59Z 2021-02-10
4 5 2021-09-15T18:59:59Z 2021-09-07
5 6 2020-01-12T18:59:59Z None
6 7 2022-02-22T18:59:59Z 2022-01-18
代码
import pandas as pd
d = {'id': [1, 2, 3, 4, 5, 6, 7],
'toDate': ['2021-03-22T18:59:59Z', None, '2021-04-22T18:59:59Z',
'2021-02-15T18:59:59Z', '2021-09-15T18:59:59Z', '2020-01-12T18:59:59Z', '2022-02-22T18:59:59Z'],
'fromDate': ['2021-02-22', '2021-03-18', '2021-03-22',
'2021-02-10', '2021-09-07', None, '2022-01-18']
}
df = pd.DataFrame(data=d)
display(df)
df['toDate'] = pd.to_datetime(df['toDate'], errors='coerce').dt.date
df['fromDate'] = pd.to_datetime(df['fromDate'], errors='coerce')
display(df)
#df['days'] = df['fromDate'].subtract(df['toDate'])
df['days'] = (df['fromDate'] - df['toDate']).dt.days
[OUT] TypeError: unsupported operand type(s) for -: 'Timestamp' and 'datetime.date'
我想要的
id toDate fromDate days
0 1 2021-03-22 2021-02-22 30
1 2 NaT 2021-03-18 NaT
2 3 2021-04-22 2021-03-22 30
3 4 2021-02-15 2021-02-10 5
4 5 2021-09-15 2021-09-07 8
5 6 2020-01-12 NaT NaT
6 7 2022-02-22 2022-01-18 34
要在 toDate
列中减去需要的日期时间,因此要将时间设置为 00:00:00
使用 Series.dt.normalize
:
df['toDate'] = pd.to_datetime(df['toDate'], errors='coerce').dt.normalize()
df['toDate'] = pd.to_datetime(df['toDate'], errors='coerce').dt.floor('D')
另一个想法是将两列都转换为日期,在更旧的 pandas 版本中应该会失败:
df['toDate'] = pd.to_datetime(df['toDate'], errors='coerce').dt.date
df['fromDate'] = pd.to_datetime(df['fromDate'], errors='coerce').dt.date
df['days'] = (df['toDate'] - df['fromDate']).dt.days
print (df)
id toDate fromDate days
0 1 2021-03-22 2021-02-22 28.0
1 2 NaT 2021-03-18 NaN
2 3 2021-04-22 2021-03-22 31.0
3 4 2021-02-15 2021-02-10 5.0
4 5 2021-09-15 2021-09-07 8.0
5 6 2020-01-12 NaT NaN
6 7 2022-02-22 2022-01-18 35.0