修复 OutOfBoundsDatetime 时过滤多行:超出范围纳秒时间戳多个值
Filtering multiple rows when fixing OutOfBoundsDatetime: Out of bounds nanosecond timestamp multiple values
我已尝试使用此代码将我的一列 search_departure_date 从数据帧 df 转换为日期时间格式,但出现以下错误。
df
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 182005 entries, 0 to 182004
Data columns (total 19 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 182005 non-null datetime64[ns]
1 device_type 182005 non-null object
2 search_origin 182005 non-null object
3 search_destination 182005 non-null object
4 search_route 182005 non-null object
5 search_adult_count 157378 non-null float64
6 search_child_count 157378 non-null float64
7 search_cabin_class 157378 non-null object
8 search_type 182005 non-null object
9 search_departure_date 182005 non-null object
10 search_arrival_date 97386 non-null datetime64[ns]
df["search_departure_date"] = pd.to_datetime(df.loc[:, 'search_departure_date'], format='%Y-%m-%d')
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1478-06-14 17:17:56
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1479-03-23 17:17:56
所以我试图过滤掉具有此时间戳值的行
df.loc[df['search_departure_date'] != '1478-06-14 17:17:56']
df.loc[df['search_departure_date'] != '1479-03-23 17:17:56']
df.loc[df['search_departure_date'] != '1478-06-14 17:17:56']
如何为多个时间戳执行此操作?我注意到它们都以 1478 或 1479 开头,仅使用竖线 (|) 运算符将它们连接在一起很麻烦。
你可以试试errors='coerce'
df["search_departure_date"] = pd.to_datetime(df['search_departure_date'], errors='coerce')
如果要过滤掉行,可以使用str.match
m = df['search_departure_date'].str.match('147(8|9)')
我已尝试使用此代码将我的一列 search_departure_date 从数据帧 df 转换为日期时间格式,但出现以下错误。
df
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 182005 entries, 0 to 182004
Data columns (total 19 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 182005 non-null datetime64[ns]
1 device_type 182005 non-null object
2 search_origin 182005 non-null object
3 search_destination 182005 non-null object
4 search_route 182005 non-null object
5 search_adult_count 157378 non-null float64
6 search_child_count 157378 non-null float64
7 search_cabin_class 157378 non-null object
8 search_type 182005 non-null object
9 search_departure_date 182005 non-null object
10 search_arrival_date 97386 non-null datetime64[ns]
df["search_departure_date"] = pd.to_datetime(df.loc[:, 'search_departure_date'], format='%Y-%m-%d')
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1478-06-14 17:17:56
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1479-03-23 17:17:56
所以我试图过滤掉具有此时间戳值的行
df.loc[df['search_departure_date'] != '1478-06-14 17:17:56']
df.loc[df['search_departure_date'] != '1479-03-23 17:17:56']
df.loc[df['search_departure_date'] != '1478-06-14 17:17:56']
如何为多个时间戳执行此操作?我注意到它们都以 1478 或 1479 开头,仅使用竖线 (|) 运算符将它们连接在一起很麻烦。
你可以试试errors='coerce'
df["search_departure_date"] = pd.to_datetime(df['search_departure_date'], errors='coerce')
如果要过滤掉行,可以使用str.match
m = df['search_departure_date'].str.match('147(8|9)')