将格式为序数日、缩写月份名称和正常年份的字符串日期列转换为%Y-%m-%d
Convert string date column with format of ordinal numeral day, abbreviated month name, and normal year to %Y-%m-%d
给定以下 df
字符串 date
列,日期为序号,月份为缩写月份名称,年份为正常:
date oil gas
0 1st Oct 2021 428 99
1 10th Sep 2021 401 101
2 2nd Oct 2020 189 74
3 10th Jan 2020 659 119
4 1st Nov 2019 691 130
5 30th Aug 2019 742 162
6 10th May 2019 805 183
7 24th Aug 2018 860 182
8 1st Sep 2017 759 183
9 10th Mar 2017 617 151
10 10th Feb 2017 591 149
11 22nd Apr 2016 343 88
12 10th Apr 2015 760 225
13 23rd Jan 2015 1317 316
我想知道我们如何将 date
列解析为标准 %Y-%m-%d
格式?
到目前为止我的想法: 1. 从字符日期字符串中去除序号指示符 ('st', 'nd', 'rd', 'th'
),同时保留日期编号 re
; 2. 并将缩写的月份名称转换为数字(似乎不是 %b
), 3. 最后将它们转换为 %Y-%m-%d
.
代码可能对第一步有用:
re.compile(r"(?<=\d)(st|nd|rd|th)").sub("", df['date'])
参考文献:
https://metacpan.org/release/DROLSKY/DateTime-Locale-0.46/view/lib/DateTime/Locale/en_US.pm#Months
如果您不指定 format
参数,pd.to_datetime
已经处理了这种情况:
>>> pd.to_datetime(df['date'])
0 2021-10-01
1 2021-09-10
2 2020-10-02
3 2020-01-10
4 2019-11-01
5 2019-08-30
6 2019-05-10
7 2018-08-24
8 2017-09-01
9 2017-03-10
10 2017-02-10
11 2016-04-22
12 2015-04-10
13 2015-01-23
Name: date, dtype: datetime64[ns]
给定以下 df
字符串 date
列,日期为序号,月份为缩写月份名称,年份为正常:
date oil gas
0 1st Oct 2021 428 99
1 10th Sep 2021 401 101
2 2nd Oct 2020 189 74
3 10th Jan 2020 659 119
4 1st Nov 2019 691 130
5 30th Aug 2019 742 162
6 10th May 2019 805 183
7 24th Aug 2018 860 182
8 1st Sep 2017 759 183
9 10th Mar 2017 617 151
10 10th Feb 2017 591 149
11 22nd Apr 2016 343 88
12 10th Apr 2015 760 225
13 23rd Jan 2015 1317 316
我想知道我们如何将 date
列解析为标准 %Y-%m-%d
格式?
到目前为止我的想法: 1. 从字符日期字符串中去除序号指示符 ('st', 'nd', 'rd', 'th'
),同时保留日期编号 re
; 2. 并将缩写的月份名称转换为数字(似乎不是 %b
), 3. 最后将它们转换为 %Y-%m-%d
.
代码可能对第一步有用:
re.compile(r"(?<=\d)(st|nd|rd|th)").sub("", df['date'])
参考文献:
https://metacpan.org/release/DROLSKY/DateTime-Locale-0.46/view/lib/DateTime/Locale/en_US.pm#Months
format
参数,pd.to_datetime
已经处理了这种情况:
>>> pd.to_datetime(df['date'])
0 2021-10-01
1 2021-09-10
2 2020-10-02
3 2020-01-10
4 2019-11-01
5 2019-08-30
6 2019-05-10
7 2018-08-24
8 2017-09-01
9 2017-03-10
10 2017-02-10
11 2016-04-22
12 2015-04-10
13 2015-01-23
Name: date, dtype: datetime64[ns]