在 Python 中使用 Pandas 进行十进制日期操作

Decimal date manipulation with Pandas in Python

这个问题可能有点傻,但我已经尝试搜索示例以使用 pandas 操作数据框中的日期。但让我感到困惑的是我的日期格式是这样的:

Time A B C D

1.000347257 626.9966431 0   0   -99.98999786
1.001041651 626.9967651 0   0   -99.98999786
1.001736164 627.0130005 0   0   -99.98999786
1.002430558 627.0130005 0   0   -99.98999786
1.003124952 627.0455933 0   0   -99.98999786
1.003819466 627.0618286 0   0   -99.98999786

...

1.998263836 627.7052002 0.3417936265    0.2321419418    0.07069379836
1.998958349 627.7216187 0.3260073066    0.2284916639    0.073251158
1.999652743 627.6726074 0.3180454969    0.2164463699    0.07418025285
2.000347137 627.7371826 0.3161731362    0.2277853489    0.07479456067
2.001041651 627.7365723 0.301556468     0.2394933105    0.07920494676
2.001736164 627.7686157 0.3718534708    0.2506033182    0.07810453326

...

366.996887  625.413574  3.168393    2.114161    2.119713
366.997559  625.413391  3.163851    2.104703    2.117746
366.998261  625.461792  3.184296    2.113827    2.117964
366.998962  625.449463  3.163331    2.117869    2.116489
366.999664  625.510681  3.166895    2.126145    2.110077

这是我存储数据的文件的摘录。有没有办法使用日期时间库将这种格式转换为类似 2010-10-23 的格式?这里的年份是 2011 年,但数据中没有指定。

谢谢!


我查看了pandas的文档,虽然我不是很懂,但它起作用了。时间采用十进制格式,按天计算。所以我只是定义它并使用时间戳来声明我已经知道的年份。

df['Time'] = pd.to_datetime(
                      df['Time'], unit='D', origin=pd.Timestamp('2011-01-01')
                      )

有了这个,结果就是我想要的。并且历时366天,如下图:

Time A B C D
2016-01-02 00:00:30.003004800   626.996643  0.000000    0.000000    -99.989998
2016-01-02 00:01:29.998646400   626.996765  0.000000    0.000000    -99.989998
2016-01-02 00:02:30.004569600   627.013000  0.000000    0.000000    -99.989998
2016-01-02 00:03:30.000211200   627.013000  0.000000    0.000000    -99.989998
2016-01-02 00:04:29.995852800   627.045593  0.000000    0.000000    -99.989998
...     ...     ...     ...     ...
2017-01-01 23:55:31.054080000   625.413574  2.706322    2.086675    2.094654
2017-01-01 23:56:29.063040000   625.413391  2.738388    2.082261    2.092784
2017-01-01 23:57:29.707200000   625.461792  2.762815    2.097127    2.091273
2017-01-01 23:58:30.351360000   625.449463  2.698989    2.105750    2.090060
2017-01-01 23:59:30.995520000   625.510681  2.751848    2.109448    2.090664

您可以使用 pd.to_datetime():

将列转换为日期时间
df.Time = pd.to_datetime(df.Time)

df.head(2)
                             Time            A  B   C            D
0   1970-01-01 00:00:01.000347257   626.996643  0   0   -99.989998
1   1970-01-01 00:00:01.001041651   626.996765  0   0   -99.989998

您的列 Time 似乎是日分数。如果您知道年份,则可以使用

将其转换为日期时间列
# 1 - convert the year to nanoseconds since the epoch
# 2 - add the day fraction, after you convert that to nanoseconds as well
# 3 - convert the resulting nanoseconds since the epoch to datetime
year = '2011'
df['datetime'] = pd.to_datetime(pd.to_datetime(year).value + df['Time']*86400*1e9)

这会给你例如

df
       Time           A  B  C          D                      datetime
0  1.000347  626.996643  0  0 -99.989998 2011-01-02 00:00:30.003004928
1  1.001042  626.996765  0  0 -99.989998 2011-01-02 00:01:29.998646272
2  1.001736  627.013000  0  0 -99.989998 2011-01-02 00:02:30.004569600
3  1.002431  627.013000  0  0 -99.989998 2011-01-02 00:03:30.000211200
4  1.003125  627.045593  0  0 -99.989998 2011-01-02 00:04:29.995852800
5  1.003819  627.061829  0  0 -99.989998 2011-01-02 00:05:30.001862400