修复 OutOfBoundsDatetime 时过滤多行:超出范围纳秒时间戳多个值

Filtering multiple rows when fixing OutOfBoundsDatetime: Out of bounds nanosecond timestamp multiple values

我已尝试使用此代码将我的一列 search_departure_date 从数据帧 df 转换为日期时间格式,但出现以下错误。

df

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 182005 entries, 0 to 182004
Data columns (total 19 columns):
 #   Column                      Non-Null Count   Dtype         
---  ------                      --------------   -----         
 0   date                        182005 non-null  datetime64[ns]
 1   device_type                 182005 non-null  object        
 2   search_origin               182005 non-null  object        
 3   search_destination          182005 non-null  object        
 4   search_route                182005 non-null  object        
 5   search_adult_count          157378 non-null  float64       
 6   search_child_count          157378 non-null  float64       
 7   search_cabin_class          157378 non-null  object        
 8   search_type                 182005 non-null  object        
 9   search_departure_date       182005 non-null  object        
 10  search_arrival_date         97386 non-null   datetime64[ns]


df["search_departure_date"] = pd.to_datetime(df.loc[:, 'search_departure_date'], format='%Y-%m-%d')

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1478-06-14 17:17:56

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1479-03-23 17:17:56

所以我试图过滤掉具有此时间戳值的行

df.loc[df['search_departure_date'] != '1478-06-14 17:17:56']

df.loc[df['search_departure_date'] != '1479-03-23 17:17:56']

df.loc[df['search_departure_date'] != '1478-06-14 17:17:56']


如何为多个时间戳执行此操作?我注意到它们都以 1478 或 1479 开头,仅使用竖线 (|) 运算符将它们连接在一起很麻烦。

你可以试试errors='coerce'

df["search_departure_date"] = pd.to_datetime(df['search_departure_date'], errors='coerce')

如果要过滤掉行,可以使用str.match

m = df['search_departure_date'].str.match('147(8|9)')