pandas 现在用时区填充日期时间列

pandas fillna datetime column with timezone now

我有一个 pandas 日期时间列,其中包含 None 值,我想在特定时区用 datetime.now() 填充。

这是我的 MWE 数据框:

df = pd.DataFrame([
    {'end': "2017-07-01 12:00:00"},
    {'end': "2017-07-02 18:13:00"},
    {'end': None},
    {'end': "2017-07-04 10:45:00"}
])

如果我填fillna:

pd.to_datetime(df['end']).fillna(datetime.now())

结果是一个具有预期 dtype 的系列:datetime64[ns]。但是当我指定时区时,例如:

pd.to_datetime(df['end']).fillna(
    datetime.now(pytz.timezone('US/Pacific')))

这个returns dtype 的系列:object

您似乎需要在 fillna 中将 date 转换为 to_datetime:

df['end'] = pd.to_datetime(df['end'])
df['end'] = df['end'].fillna(pd.to_datetime(pd.datetime.now(pytz.timezone('US/Pacific'))))
print (df)
                                end
0               2017-07-01 12:00:00
1               2017-07-02 18:13:00
2  2017-07-04 03:35:08.499418-07:00
3               2017-07-04 10:45:00

print (df['end'].apply(type))
0    <class 'pandas._libs.tslib.Timestamp'>
1    <class 'pandas._libs.tslib.Timestamp'>
2    <class 'pandas._libs.tslib.Timestamp'>
3    <class 'pandas._libs.tslib.Timestamp'>
Name: end, dtype: object

dtype 仍然不是 datetime64:

print (df['end'].dtype)
object

我认为解决方案是将参数 utc 传递给 to_datetime:

utc : boolean, default None

Return UTC DatetimeIndex if True (converting any tz-aware datetime.datetime objects as well).

df['end'] = df['end'].fillna(pd.datetime.now(pytz.timezone('US/Pacific')))
df['end'] = pd.to_datetime(df['end'], utc=True)

#print (df)

print (df['end'].apply(type))
0    <class 'pandas._libs.tslib.Timestamp'>
1    <class 'pandas._libs.tslib.Timestamp'>
2    <class 'pandas._libs.tslib.Timestamp'>
3    <class 'pandas._libs.tslib.Timestamp'>
Name: end, dtype: object

print (df['end'].dtypes)
datetime64[ns]

的最终解决方案:

df['end'] = pd.to_datetime(df['end']).dt.tz_localize('US/Pacific')
df['end'] = df['end'].fillna(pd.datetime.now(pytz.timezone('US/Pacific')))

print (df.end.dtype)
datetime64[ns, US/Pacific]