日期导出中的 5 位数年份 (52164-01-19 00:00:00+00)

5-digit-year in date export (52164-01-19 00:00:00+00)

我收到了来自未指定源系统的数据导出,其中包括以下格式的日期:

  1. 52164-01-19 00:00:00+00
  2. 52992-08-12 04:29:36+00
  3. 52838-10-19 04:08:32.999936+00
  4. 54022-03-12 17:20:36.999936+00

我被告知错误是由于 unix 到日期时间(秒和毫秒)的错误转换引起的。

我们提出了一种重构 python 中日期的可能方法,方法是将年份分隔为“正常”年份 (2164),并将其余年份转换为毫秒。

import time
import math

d0 = '52164-01-19 00:00:00+00'
d0_y = 52164
multiplier = 5

# avg gregorian year in seconds
avg_greg = (365.2425 * 24 * 3600)

d1_modulus = 52164 % (1000 * multiplier)
d1_rest = d0_y - d1_modulus

# replace orginal year with modulus 
d1_time = time.strptime(str(d1_modulus) + '-10-19 04:08:32', '%Y-%m-%d %H:%M:%S')

#convert to milliseconds and add d1_rest in "seconds"
bigtime = time.mktime(d1_time) + (avg_greg * d1_rest) #in milliseconds
biggertime = bigtime / 1000 # in seconds
finaltime = time.ctime(biggertime)
# finaltime = 'Thu Mar 12 07:34:41 2020'
print(finaltime)

此代码可能会因不同的日期而中断,并导致 multiplier/modulus 组合,从而为 time.mktime 创建超出范围的值。

有人可以对此提出替代或更好的方法吗?

提前致谢 加布

Python datetime 只支持 1 到 9999 之间的年份。
所以我安装了 astropy,它工作正常:

import datetime
import re

import astropy.time as astropy_time  # installed with PIP


faulty_data = "52164-01-19 00:00:00+00"
timeformat = re.compile(r"(?P<year>\d{5})-(?P<month>\d{2})-(?P<day>\d{2}) (?P<hour>\d{2}):(?P<minute>\d{2}):(?P<second>\d{2})(?P<sign_tz>[+\- ])(?P<hour_tz>\d{2})")

match = timeformat.fullmatch(faulty_data)
assert match
assert len(match.group("year")) == 5
assert match.group("hour_tz") == "00"
missing_thousand_years = int(match.group("year")[0])

time = astropy_time.Time({"year": int(match.group("year")),
                          "month": int(match.group("month")),
                          "day": int(match.group("day")),
                          "hour": int(match.group("hour")),
                          "minute": int(match.group("minute")),
                          "second": int(match.group("second"))
                          },
                         scale="utc")
print(time)
milliseconds = time.unix
print(milliseconds)
actual_datetime = datetime.datetime.fromtimestamp(milliseconds / 1000)
print(actual_datetime)
(52164, 1, 19, 0, 0, 0.)
1583971200000.0
2020-03-12 01:00:00

看来原来的日期是2020-03-12 01:00:00,这和你用你的方法得到的日期很接近。

注意:它引发了两个警告,即 you can silence