如何在 Python 中的数据框中找到行中发生的错误?
How a find an error occurring in rows in dataframe in Python?
df["Dt_Customer"] = pd.to_datetime(df["Dt_Customer"],format='%d-%m-%y')
我尝试转换日期列,数据集包含超过 100 万行...我必须找到未转换的日期行。
TypeError: Unrecognized value type: <class 'str'>
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-124-d701d963ff8c> in <module>
----> 1 df["Dt_Customer"] = pd.to_datetime(df["Dt_Customer"],format='%d-%m-%y')
c:\users\dell\appdata\local\programs\python\python39\lib\site-packages\pandas\core\tools\datetimes.py
in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format,
origin, cache)
803 result = arg.map(cache_array)
804 else:
--> 805 values = convert_listlike(arg._values, format)
806 result = arg._constructor(values, index=arg.index, name=arg.name)
807 elif isinstance(arg, (ABCDataFrame, abc.MutableMapping)):
c:\users\dell\appdata\local\programs\python\python39\lib\site-packages\pandas\core\tools\datetimes.py
in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst,
yearfirst, exact)
458 return DatetimeIndex._simple_new(dta, name=name)
459 except (ValueError, TypeError):
--> 460 raise e
461
462 if result is None:
c:\users\dell\appdata\local\programs\python\python39\lib\site-packages\pandas\core\tools\datetimes.py
in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst,
yearfirst, exact)
421 if result is None:
422 try:
--> 423 result, timezones = array_strptime(
424 arg, format, exact=exact, errors=errors
425 )
pandas\_libs\tslibs\strptime.pyx in pandas._libs.tslibs.strptime.array_strptime()
ValueError: unconverted data remains: 12
您可以使用 try 和 except 来尝试循环:
causing_error_list = []
for x in df["Dt_Customer"].values:
try:
pd.to_datetime(x,format='%d-%m-%y')
except:
causing_error_list.append(x)
一个有效的解决方案是将日期字符串解析为日期时间,并将关键字 errors
设置为 'coerce'
。这将为无效字符串提供 NaT
(非一次)。您可以通过调用 .isnull()
从中派生一个布尔掩码,然后您可以使用它来提取相应的值。
例如:
import pandas as pd
df = pd.DataFrame({"Dt_Customer": ["28-12-2020", "not a date"]})
invalid = df.loc[pd.to_datetime(df["Dt_Customer"],
format='%d-%m-%Y',
errors='coerce').isnull(), "Dt_Customer"]
print(invalid)
1 not a date
Name: Dt_Customer, dtype: object
请注意,您也可以省略 format
关键字以使检查不特定,即接受解析器可以解析的任何 date/time 格式。
df["Dt_Customer"] = pd.to_datetime(df["Dt_Customer"],format='%d-%m-%y')
我尝试转换日期列,数据集包含超过 100 万行...我必须找到未转换的日期行。
TypeError: Unrecognized value type: <class 'str'>
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-124-d701d963ff8c> in <module>
----> 1 df["Dt_Customer"] = pd.to_datetime(df["Dt_Customer"],format='%d-%m-%y')
c:\users\dell\appdata\local\programs\python\python39\lib\site-packages\pandas\core\tools\datetimes.py
in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format,
origin, cache)
803 result = arg.map(cache_array)
804 else:
--> 805 values = convert_listlike(arg._values, format)
806 result = arg._constructor(values, index=arg.index, name=arg.name)
807 elif isinstance(arg, (ABCDataFrame, abc.MutableMapping)):
c:\users\dell\appdata\local\programs\python\python39\lib\site-packages\pandas\core\tools\datetimes.py
in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst,
yearfirst, exact)
458 return DatetimeIndex._simple_new(dta, name=name)
459 except (ValueError, TypeError):
--> 460 raise e
461
462 if result is None:
c:\users\dell\appdata\local\programs\python\python39\lib\site-packages\pandas\core\tools\datetimes.py
in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst,
yearfirst, exact)
421 if result is None:
422 try:
--> 423 result, timezones = array_strptime(
424 arg, format, exact=exact, errors=errors
425 )
pandas\_libs\tslibs\strptime.pyx in pandas._libs.tslibs.strptime.array_strptime()
ValueError: unconverted data remains: 12
您可以使用 try 和 except 来尝试循环:
causing_error_list = []
for x in df["Dt_Customer"].values:
try:
pd.to_datetime(x,format='%d-%m-%y')
except:
causing_error_list.append(x)
一个有效的解决方案是将日期字符串解析为日期时间,并将关键字 errors
设置为 'coerce'
。这将为无效字符串提供 NaT
(非一次)。您可以通过调用 .isnull()
从中派生一个布尔掩码,然后您可以使用它来提取相应的值。
例如:
import pandas as pd
df = pd.DataFrame({"Dt_Customer": ["28-12-2020", "not a date"]})
invalid = df.loc[pd.to_datetime(df["Dt_Customer"],
format='%d-%m-%Y',
errors='coerce').isnull(), "Dt_Customer"]
print(invalid)
1 not a date
Name: Dt_Customer, dtype: object
请注意,您也可以省略 format
关键字以使检查不特定,即接受解析器可以解析的任何 date/time 格式。