如何忽略在数据帧行上使用 lambda 函数时引发的错误？

Question

我正在使用 lambda 函数对所有行执行 Pandas 操作

match = re.compile(r"([\d]{2,4}[-|/][\d]{1,2}[-|/][\d]{2,4})")
date_to_month = lambda x: pd.to_datetime(x.group(0)).strftime("%B")
data["path"] = data["path"].str.replace(match, date_to_month, regex=True)

数据框太大，对于特定行，我收到以下错误：

DateParseError: Invalid date specified (17/25)

我尝试添加 try except 如下所示：

try:
    match = re.compile(r"([\d]{2,4}[-|/][\d]{1,2}[-|/][\d]{2,4})")
    date_to_month = lambda x: pd.to_datetime(x.group(0)).strftime("%B")
    data["path"] = data["path"].str.replace(match, date_to_month, regex=True)
except:
    pass

现在这将通过错误。问题是只有一行导致此错误，所有其他行都受此影响，因为此操作不会发生在其他行上。

有没有一种方法可以跳过执行时抛出错误的行而不影响对其他行的操作？

Answer 1

因此，模仿您的用例：

import re
import pandas as pd

df = pd.DataFrame({"col": ["a", "b", "c"], "path": ["--10--", "--99--", "--12--"]})

# Should convert '10' to 'October
match = re.compile("(\d{2})")
date_to_month = lambda x: pd.to_datetime(x[0], format="%m").strftime("%B")

# Raises ValueError: unconverted data remains [99]
df["path"] = df["path"].str.replace(match, date_to_month, regex=True)

这是一种可能的解决方法：

def convert_month(x):
    """Put the code in a function and refactor it to use Python 're.sub'
    instead of Pandas 'str.replace'.
    """
    match = re.compile("(\d{2})")
    date_to_month = lambda x: pd.to_datetime(x[0], format="%m").strftime('%B')
    return re.sub(match, date_to_month, x)


def ignore_exception(func, x):
    """Define a helper function.
    """
    try:
        return func(x)
    except Exception:
        return x


df["path"] = df["path"].apply(lambda x: ignore_exception(convert_month, x))

print(df)
# Output with no error raised
  col          path
0   a   --October--
1   b        --99--
2   c  --December--

相同的想法，使用装饰器：

def ignore_exception(func):
    """Define a decorator."""
    def wrapper(x):
        try:
            return func(x)
        except Exception:
            return x
    return wrapper

@ignore_exception
def convert_month(x):
   ...

df["path"] = df["path"].apply(convert_month)

print(df)
# Output with no error raised
  col          path
0   a   --October--
1   b        --99--
2   c  --December--

如何忽略在数据帧行上使用 lambda 函数时引发的错误？

How to ignore an error raised while using a lambda function on a dataframe rows?

python

exception

pandas