如何更改列的日期格式并确定 Jupyter 中的年数?
How do I change the date formatting of a column and determine the number of years in Jupyter?
我一直在尝试弄清楚如何使用 Jupyter Notebook 大量更改数据的日期格式(因为数据集上有数百万个数据),因为给我的两个数据集具有不同的日期格式。尝试 google 获取有关如何更改日期格式的代码,但没有成功。例如,我想在合并到数据框后更改“Discharged”和初始数据集的日期格式,所需的输出看起来像这样
数据集(使用 Dataframe 合并)
ID
Age
Date Seen
Date Discharged
001
21
2019-10-22
02-02-2022 08:00:00PM
002
18
2013-05-24
15-05-2019 06:30:00PM
期望输出
ID
Age
Date Seen
Date Discharged
Calculated Years (Round Up)
001
21
2019-10-22
2022-02-02
3
002
18
2013-05-24
2019-05-15
6
使用dt.normalize
:
# Convert to datetime64 if it's not already the case
df['Date Seen'] = pd.to_datetime(df['Date Seen'])
df['Date Discharged'] = pd.to_datetime(df['Date Discharged'])
# Keep date part and compute years
df['Date Discharged'] = df['Date Discharged'].dt.normalize()
df['Years'] = df['Date Discharged'].dt.year - df['Date Seen'].dt.year
输出:
>>> df
ID Age Date Seen Date Discharged Years
0 001 21 2019-10-22 2022-02-02 3
1 002 18 2013-05-24 2019-05-15 6
我一直在尝试弄清楚如何使用 Jupyter Notebook 大量更改数据的日期格式(因为数据集上有数百万个数据),因为给我的两个数据集具有不同的日期格式。尝试 google 获取有关如何更改日期格式的代码,但没有成功。例如,我想在合并到数据框后更改“Discharged”和初始数据集的日期格式,所需的输出看起来像这样
数据集(使用 Dataframe 合并)
ID | Age | Date Seen | Date Discharged |
---|---|---|---|
001 | 21 | 2019-10-22 | 02-02-2022 08:00:00PM |
002 | 18 | 2013-05-24 | 15-05-2019 06:30:00PM |
期望输出
ID | Age | Date Seen | Date Discharged | Calculated Years (Round Up) |
---|---|---|---|---|
001 | 21 | 2019-10-22 | 2022-02-02 | 3 |
002 | 18 | 2013-05-24 | 2019-05-15 | 6 |
使用dt.normalize
:
# Convert to datetime64 if it's not already the case
df['Date Seen'] = pd.to_datetime(df['Date Seen'])
df['Date Discharged'] = pd.to_datetime(df['Date Discharged'])
# Keep date part and compute years
df['Date Discharged'] = df['Date Discharged'].dt.normalize()
df['Years'] = df['Date Discharged'].dt.year - df['Date Seen'].dt.year
输出:
>>> df
ID Age Date Seen Date Discharged Years
0 001 21 2019-10-22 2022-02-02 3
1 002 18 2013-05-24 2019-05-15 6