pandas read_json returns 对象而不是某些 datetime64 列上的 datetime64
pandas read_json returns object instead of datetime64 on some datetime64 columns
我有一个 DataFrame,我想将其序列化为 JSON,并能够在 DataFrame 中读回它。有 2 个 datetime64 列,但其中一个作为对象返回。我也丢失了时区信息,但我从 看到我不能那样做。
wxdata.info()
pd.read_json(wxdata.to_json(date_format='iso')).info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9853 entries, 0 to 9852
Data columns (total 30 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 time_of_day 9853 non-null datetime64[ns, US/Eastern]
1 temp1 9853 non-null float64
2 wind_chill 9853 non-null float64
3 heat_index 9853 non-null float64
4 dew_point 9853 non-null float64
5 degree_day 9853 non-null float64
6 density_altitude 9853 non-null float64
7 wet_bulb_globe_temp 9853 non-null float64
8 adjusted_altitude 9853 non-null float64
9 SAE_correction_factor 9853 non-null float64
10 rel_humidity 9853 non-null int64
11 inst_wind_speed 9853 non-null float64
12 inst_wind_dir 9853 non-null float64
13 two_min_rolling_avg_wind_speed 9853 non-null float64
14 two_min_rolling_avg_wind_dir 9853 non-null float64
15 ten_min_rolling_avg_wind_speed 9853 non-null float64
16 ten_min_rolling_avg_wind_dir 9853 non-null float64
17 sixty_min_winddir_atpeak 9853 non-null int64
18 sixty_min_peak_windspeed 9853 non-null float64
19 ten_min_winddir_atpeak 9853 non-null int64
20 ten_min_peak_windspeed 9853 non-null float64
21 ten_min_wind_gust_time 9853 non-null datetime64[ns, US/Eastern]
22 rain_today 9853 non-null int64
23 rain_this_week 9853 non-null int64
24 rain_this_month 9853 non-null int64
25 rain_this_year 9853 non-null int64
26 rain_rate 9853 non-null int64
27 raw_barom_pressure 9853 non-null float64
28 barom_press 9853 non-null float64
29 solar_radiation 9853 non-null int64
dtypes: datetime64[ns, US/Eastern](2), float64(19), int64(9)
memory usage: 2.3 MB
<class 'pandas.core.frame.DataFrame'>
Int64Index: 9853 entries, 0 to 9852
Data columns (total 30 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 time_of_day 9853 non-null object
1 temp1 9853 non-null float64
2 wind_chill 9853 non-null float64
3 heat_index 9853 non-null float64
4 dew_point 9853 non-null float64
5 degree_day 9853 non-null float64
6 density_altitude 9853 non-null float64
7 wet_bulb_globe_temp 9853 non-null float64
8 adjusted_altitude 9853 non-null float64
9 SAE_correction_factor 9853 non-null float64
10 rel_humidity 9853 non-null int64
11 inst_wind_speed 9853 non-null float64
12 inst_wind_dir 9853 non-null float64
13 two_min_rolling_avg_wind_speed 9853 non-null float64
14 two_min_rolling_avg_wind_dir 9853 non-null float64
15 ten_min_rolling_avg_wind_speed 9853 non-null float64
16 ten_min_rolling_avg_wind_dir 9853 non-null float64
17 sixty_min_winddir_atpeak 9853 non-null int64
18 sixty_min_peak_windspeed 9853 non-null float64
19 ten_min_winddir_atpeak 9853 non-null int64
20 ten_min_peak_windspeed 9853 non-null float64
21 ten_min_wind_gust_time 9853 non-null datetime64[ns, UTC]
22 rain_today 9853 non-null int64
23 rain_this_week 9853 non-null int64
24 rain_this_month 9853 non-null int64
25 rain_this_year 9853 non-null int64
26 rain_rate 9853 non-null int64
27 raw_barom_pressure 9853 non-null float64
28 barom_press 9853 non-null float64
29 solar_radiation 9853 non-null int64
dtypes: datetime64[ns, UTC](1), float64(19), int64(9), object(1)
memory usage: 2.3+ MB
如您所见,第一个 datetime64 列作为对象而不是 datetime64 返回。在没有 date_format='iso' 开关的情况下执行此操作,'time_of_day' returns 作为 int64,而不是 datetime64。
感谢您的帮助。
您可以向日期转换方法提供 errors='coerce'
参数。不可转换的条目将作为 NaT(不是时间)返回。然后你可以看到有问题的项目。例如:
df['time_of_day'] = pd.to_datetime(df['time_of_day'],
errors='coerce',
exact='True',
format='%Y-%m-%d' # provide your format here
)
就是这样。我将 'time_of_day' 列重命名为 'timestamp',两列现在都是 datetime64.
https://pandas.pydata.org/docs/user_guide/io.html#io-json-reader
Note
如果 convert_dates=True 并且数据和/或列标签出现“date-like”,则大整数值可能会转换为日期。确切的阈值取决于指定的 date_unit。 'date-like' 表示列标签满足以下条件之一:
以“_at”结尾
以“_time”结尾
它以 'timestamp'
开头
是'modified'
是'date'
我有一个 DataFrame,我想将其序列化为 JSON,并能够在 DataFrame 中读回它。有 2 个 datetime64 列,但其中一个作为对象返回。我也丢失了时区信息,但我从
wxdata.info()
pd.read_json(wxdata.to_json(date_format='iso')).info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9853 entries, 0 to 9852
Data columns (total 30 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 time_of_day 9853 non-null datetime64[ns, US/Eastern]
1 temp1 9853 non-null float64
2 wind_chill 9853 non-null float64
3 heat_index 9853 non-null float64
4 dew_point 9853 non-null float64
5 degree_day 9853 non-null float64
6 density_altitude 9853 non-null float64
7 wet_bulb_globe_temp 9853 non-null float64
8 adjusted_altitude 9853 non-null float64
9 SAE_correction_factor 9853 non-null float64
10 rel_humidity 9853 non-null int64
11 inst_wind_speed 9853 non-null float64
12 inst_wind_dir 9853 non-null float64
13 two_min_rolling_avg_wind_speed 9853 non-null float64
14 two_min_rolling_avg_wind_dir 9853 non-null float64
15 ten_min_rolling_avg_wind_speed 9853 non-null float64
16 ten_min_rolling_avg_wind_dir 9853 non-null float64
17 sixty_min_winddir_atpeak 9853 non-null int64
18 sixty_min_peak_windspeed 9853 non-null float64
19 ten_min_winddir_atpeak 9853 non-null int64
20 ten_min_peak_windspeed 9853 non-null float64
21 ten_min_wind_gust_time 9853 non-null datetime64[ns, US/Eastern]
22 rain_today 9853 non-null int64
23 rain_this_week 9853 non-null int64
24 rain_this_month 9853 non-null int64
25 rain_this_year 9853 non-null int64
26 rain_rate 9853 non-null int64
27 raw_barom_pressure 9853 non-null float64
28 barom_press 9853 non-null float64
29 solar_radiation 9853 non-null int64
dtypes: datetime64[ns, US/Eastern](2), float64(19), int64(9)
memory usage: 2.3 MB
<class 'pandas.core.frame.DataFrame'>
Int64Index: 9853 entries, 0 to 9852
Data columns (total 30 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 time_of_day 9853 non-null object
1 temp1 9853 non-null float64
2 wind_chill 9853 non-null float64
3 heat_index 9853 non-null float64
4 dew_point 9853 non-null float64
5 degree_day 9853 non-null float64
6 density_altitude 9853 non-null float64
7 wet_bulb_globe_temp 9853 non-null float64
8 adjusted_altitude 9853 non-null float64
9 SAE_correction_factor 9853 non-null float64
10 rel_humidity 9853 non-null int64
11 inst_wind_speed 9853 non-null float64
12 inst_wind_dir 9853 non-null float64
13 two_min_rolling_avg_wind_speed 9853 non-null float64
14 two_min_rolling_avg_wind_dir 9853 non-null float64
15 ten_min_rolling_avg_wind_speed 9853 non-null float64
16 ten_min_rolling_avg_wind_dir 9853 non-null float64
17 sixty_min_winddir_atpeak 9853 non-null int64
18 sixty_min_peak_windspeed 9853 non-null float64
19 ten_min_winddir_atpeak 9853 non-null int64
20 ten_min_peak_windspeed 9853 non-null float64
21 ten_min_wind_gust_time 9853 non-null datetime64[ns, UTC]
22 rain_today 9853 non-null int64
23 rain_this_week 9853 non-null int64
24 rain_this_month 9853 non-null int64
25 rain_this_year 9853 non-null int64
26 rain_rate 9853 non-null int64
27 raw_barom_pressure 9853 non-null float64
28 barom_press 9853 non-null float64
29 solar_radiation 9853 non-null int64
dtypes: datetime64[ns, UTC](1), float64(19), int64(9), object(1)
memory usage: 2.3+ MB
如您所见,第一个 datetime64 列作为对象而不是 datetime64 返回。在没有 date_format='iso' 开关的情况下执行此操作,'time_of_day' returns 作为 int64,而不是 datetime64。
感谢您的帮助。
您可以向日期转换方法提供 errors='coerce'
参数。不可转换的条目将作为 NaT(不是时间)返回。然后你可以看到有问题的项目。例如:
df['time_of_day'] = pd.to_datetime(df['time_of_day'],
errors='coerce',
exact='True',
format='%Y-%m-%d' # provide your format here
)
就是这样。我将 'time_of_day' 列重命名为 'timestamp',两列现在都是 datetime64.
https://pandas.pydata.org/docs/user_guide/io.html#io-json-reader
Note
如果 convert_dates=True 并且数据和/或列标签出现“date-like”,则大整数值可能会转换为日期。确切的阈值取决于指定的 date_unit。 'date-like' 表示列标签满足以下条件之一:
以“_at”结尾
以“_time”结尾
它以 'timestamp'
开头是'modified'
是'date'