Pandas to_datetime 忽略格式

Pandas to_datetime ignore the format

我试图将存储在数据框中的日期转换为 DateTime 格式。我要转换的列的日期以 mm/dd/yy 格式存储。

这是我用来转换的脚本:

df['dt'] = pd.to_datetime(df['dt'], format = '%d-%m-%Y')

即使提供的格式不正确,脚本运行时也没有错误地准确转换日期。

我的问题是,为什么在提供错误格式时脚本没有抛出错误?

考虑日期 1-2-2020。现在只看日期你能说出具体日期吗?答案是否定的,因为,除非你知道日期是如何格式化的或者日期是如何创建的,即是日-月-年还是月-日-年,你不能真正说出上面的日期是否是 1st February 20202nd January 2020。因此,这里的关键是验证数据集及其来源。您可以将多种直觉技巧应用于您的数据,例如,如果数据源自美国,则常用日期格式为 MM/DD/YYYY,如果是印度,则为 DD-MM-YY.

样本

>>> import pandas as pd
>>> df = pd.DataFrame({'dt': ['1-1-2020', '15-2-2020', '3-24-2020']})
>>> df
          dt
0   1-1-2020
1  15-2-2020
2  3-24-2020

CODE - 按预期抛出错误

>>> pd.to_datetime(df['dt'], format = '%d-%m-%Y')
Traceback (most recent call last):
  File "/home/vishnudev/anaconda3/envs/sumyag/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 448, in _convert_listlike_datetimes
    values, tz = conversion.datetime_to_datetime64(arg)
  File "pandas/_libs/tslibs/conversion.pyx", line 200, in pandas._libs.tslibs.conversion.datetime_to_datetime64
TypeError: Unrecognized value type: <class 'str'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/vishnudev/anaconda3/envs/sumyag/lib/python3.7/site-packages/pandas/util/_decorators.py", line 208, in wrapper
    return func(*args, **kwargs)
  File "/home/vishnudev/anaconda3/envs/sumyag/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 778, in to_datetime
    values = convert_listlike(arg._values, True, format)
  File "/home/vishnudev/anaconda3/envs/sumyag/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 451, in _convert_listlike_datetimes
    raise e
  File "/home/vishnudev/anaconda3/envs/sumyag/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 416, in _convert_listlike_datetimes
    arg, format, exact=exact, errors=errors
  File "pandas/_libs/tslibs/strptime.pyx", line 142, in pandas._libs.tslibs.strptime.array_strptime
ValueError: time data '3-24-2020' does not match format '%d-%m-%Y' (match)

下面的代码对我有用:

df['date'] = pd.to_datetime(df['date'], format = '%d-%m-%Y', unit='ns')

df['date'] = pd.to_datetime(df['date'], format = '%d-%m-%Y')
df['date'] = pd.to_datetime(df.date, unit='ns')