在 pandas 内从本地夏令时转换为本地标准时间

Convert from naive local daylight time to naive local standard time in pandas

我有按当地夏令时记录的每小时数据记录(对我来说是 US/Pacific)。这些将通过 csv 读入。当我们 spring 前进时,在 02:00 的 DST 开始处存在间隙。在秋天,我相信在 01:00 PDT 收集的数据被标记为 01:00,而下一个小时被标记为 02:00(并假设 PST)。

我想翻译时间戳,以便它们与存储在 PST 中的其他数据一起使用。以下是我的尝试,其中我只关注应该简化讨论的索引。

tndx = pd.DatetimeIndex(["2016-11-06 00:00",""2016-11-06 01:00","2016-11-06 02:00","2016-11-06 03:00"])
tndx.tz_localize('US/Pacific',ambiguous="NaT").tz_convert('Etc/GMT+8') 
print(tndx).tz_localize(None)

输出为:

DatetimeIndex(['2016-11-05 23:00:00-08:00',                       'NaT',
               '2016-11-06 02:00:00-08:00', '2016-11-06 03:00:00-08:00']

这有两点不对。首先,从 PST 的角度来看,我现在似乎在 00:00 和 01:00 处缺少两个时间戳。我知道该过程是有损的,但我不认为该过程必须有损超过一个时间戳。 ambiguous = "infer" 出现异常,因为没有多余的值。当我按照 karajdaar 的建议将其显式设置为布尔数组时,我不会丢失额外的时间点。但是,布尔列表并不那么容易获得——我不能使用 tndx,因为它还不支持 tz。我能想到的唯一方法是通过 datetime.dst 的迂回路线,它涉及一个单独的 DataFrame 和转换:

# Create a date range that spans the possible times and is hourly
ndx2 = pd.date_range(start=pd.Timestamp(2016,11,5), end =pd.Timestamp(2016,11,7),freq='H',tz='US/Pacific')

# Here is the determination of whether it is dst
isdst = [bool(x.dst()) for x in ndx2.to_pydatetime()]

# I use DataFrame indexing to perform the lookup 
# for values in my original index
df2 = pd.DataFrame({"isdst":isdst},index=ndx2.tz_localize(None))
df2 = df2.loc[~df2.index.duplicated(keep="last")]
ambig = df2[tndx]    # This is what I would use for ambiguous

其次,我使用了 Etc/GMT+8,因为我基本上犯了错误,发现它提供了正确的偏移量和时间戳,尤其是在我再次让邮票变得天真之后。如果我不去除时区信息(即没有最后一个 tz_convert(None)),输出将是:

>>> tndx.tz_localize('US/Pacific',ambiguous='NaT').tz_convert('Etc/GMT+8')
DatetimeIndex(['2016-11-05 23:00:00-08:00',                       'NaT',
               '2016-11-06 02:00:00-08:00', '2016-11-06 03:00:00-08:00'],
              dtype='datetime64[ns, Etc/GMT+8]', freq=None)

这种情况下的偏移量看起来不错,但 dtype 中的时区似乎具有误导性,无论如何为什么称为 GMT+8 的时区给出 -8 的偏移量?我对这些转换有什么不理解的地方?

如果您不能使用 "infer" 因为您没有多余的值,您可以传入一个布尔数组以指示 True 是否日光时间有效(在这种情况下让我们假设它无效)

print(tndx.tz_localize('US/Pacific',ambiguous=[False, False, False]).tz_convert('Etc/GMT+8'))

根据文档,此标志仅适用于模棱两可的时间,在这种情况下您只有 1 个。

然后获取朴素的时间戳

print(tndx.tz_localize('US/Pacific',ambiguous=[False, False, False]).tz_localize(None))

The offsets in this case look fine, but the timezone in the dtype seems misleading and in any event why is a time zone called GMT+8 giving offsets of -8? What am I not understanding about these conversions?

我发帖到这个问题是因为寻找答案把我带到了这里,我找到了更多信息。

pandas 时区转换功能似乎基于 IANA Time Zone Database

在时区数据库的etcetera文件中是方便的注释

Be consistent with POSIX TZ settings in the Zone names, even though this is the opposite of what many people expect. POSIX has positive signs west of Greenwich, but many people expect positive signs east of Greenwich. For example, TZ='Etc/GMT+4' uses the abbreviation "-04" and corresponds to 4 hours behind UT (i.e. west of Greenwich) even though many people would expect it to mean 4 hours ahead of UT (i.e. east of Greenwich).

另见 Wikipedia entry for IANA time zones,其中说

The special area of "Etc" is used for some administrative zones, particularly for "Etc/UTC" which represents Coordinated Universal Time. In order to conform with the POSIX style, those zone names beginning with "Etc/GMT" have their sign reversed from the standard ISO 8601 convention. In the "Etc" area, zones west of GMT have a positive sign and those east have a negative sign in their name (e.g "Etc/GMT-14" is 14 hours ahead of GMT).