在 Python 中使用 Pandas 进行十进制日期操作
Decimal date manipulation with Pandas in Python
这个问题可能有点傻,但我已经尝试搜索示例以使用 pandas 操作数据框中的日期。但让我感到困惑的是我的日期格式是这样的:
Time A B C D
1.000347257 626.9966431 0 0 -99.98999786
1.001041651 626.9967651 0 0 -99.98999786
1.001736164 627.0130005 0 0 -99.98999786
1.002430558 627.0130005 0 0 -99.98999786
1.003124952 627.0455933 0 0 -99.98999786
1.003819466 627.0618286 0 0 -99.98999786
...
1.998263836 627.7052002 0.3417936265 0.2321419418 0.07069379836
1.998958349 627.7216187 0.3260073066 0.2284916639 0.073251158
1.999652743 627.6726074 0.3180454969 0.2164463699 0.07418025285
2.000347137 627.7371826 0.3161731362 0.2277853489 0.07479456067
2.001041651 627.7365723 0.301556468 0.2394933105 0.07920494676
2.001736164 627.7686157 0.3718534708 0.2506033182 0.07810453326
...
366.996887 625.413574 3.168393 2.114161 2.119713
366.997559 625.413391 3.163851 2.104703 2.117746
366.998261 625.461792 3.184296 2.113827 2.117964
366.998962 625.449463 3.163331 2.117869 2.116489
366.999664 625.510681 3.166895 2.126145 2.110077
这是我存储数据的文件的摘录。有没有办法使用日期时间库将这种格式转换为类似 2010-10-23 的格式?这里的年份是 2011 年,但数据中没有指定。
谢谢!
我查看了pandas的文档,虽然我不是很懂,但它起作用了。时间采用十进制格式,按天计算。所以我只是定义它并使用时间戳来声明我已经知道的年份。
df['Time'] = pd.to_datetime(
df['Time'], unit='D', origin=pd.Timestamp('2011-01-01')
)
有了这个,结果就是我想要的。并且历时366天,如下图:
Time A B C D
2016-01-02 00:00:30.003004800 626.996643 0.000000 0.000000 -99.989998
2016-01-02 00:01:29.998646400 626.996765 0.000000 0.000000 -99.989998
2016-01-02 00:02:30.004569600 627.013000 0.000000 0.000000 -99.989998
2016-01-02 00:03:30.000211200 627.013000 0.000000 0.000000 -99.989998
2016-01-02 00:04:29.995852800 627.045593 0.000000 0.000000 -99.989998
... ... ... ... ...
2017-01-01 23:55:31.054080000 625.413574 2.706322 2.086675 2.094654
2017-01-01 23:56:29.063040000 625.413391 2.738388 2.082261 2.092784
2017-01-01 23:57:29.707200000 625.461792 2.762815 2.097127 2.091273
2017-01-01 23:58:30.351360000 625.449463 2.698989 2.105750 2.090060
2017-01-01 23:59:30.995520000 625.510681 2.751848 2.109448 2.090664
您可以使用 pd.to_datetime():
将列转换为日期时间
df.Time = pd.to_datetime(df.Time)
df.head(2)
Time A B C D
0 1970-01-01 00:00:01.000347257 626.996643 0 0 -99.989998
1 1970-01-01 00:00:01.001041651 626.996765 0 0 -99.989998
您的列 Time
似乎是日分数。如果您知道年份,则可以使用
将其转换为日期时间列
# 1 - convert the year to nanoseconds since the epoch
# 2 - add the day fraction, after you convert that to nanoseconds as well
# 3 - convert the resulting nanoseconds since the epoch to datetime
year = '2011'
df['datetime'] = pd.to_datetime(pd.to_datetime(year).value + df['Time']*86400*1e9)
这会给你例如
df
Time A B C D datetime
0 1.000347 626.996643 0 0 -99.989998 2011-01-02 00:00:30.003004928
1 1.001042 626.996765 0 0 -99.989998 2011-01-02 00:01:29.998646272
2 1.001736 627.013000 0 0 -99.989998 2011-01-02 00:02:30.004569600
3 1.002431 627.013000 0 0 -99.989998 2011-01-02 00:03:30.000211200
4 1.003125 627.045593 0 0 -99.989998 2011-01-02 00:04:29.995852800
5 1.003819 627.061829 0 0 -99.989998 2011-01-02 00:05:30.001862400
这个问题可能有点傻,但我已经尝试搜索示例以使用 pandas 操作数据框中的日期。但让我感到困惑的是我的日期格式是这样的:
Time A B C D
1.000347257 626.9966431 0 0 -99.98999786
1.001041651 626.9967651 0 0 -99.98999786
1.001736164 627.0130005 0 0 -99.98999786
1.002430558 627.0130005 0 0 -99.98999786
1.003124952 627.0455933 0 0 -99.98999786
1.003819466 627.0618286 0 0 -99.98999786
...
1.998263836 627.7052002 0.3417936265 0.2321419418 0.07069379836
1.998958349 627.7216187 0.3260073066 0.2284916639 0.073251158
1.999652743 627.6726074 0.3180454969 0.2164463699 0.07418025285
2.000347137 627.7371826 0.3161731362 0.2277853489 0.07479456067
2.001041651 627.7365723 0.301556468 0.2394933105 0.07920494676
2.001736164 627.7686157 0.3718534708 0.2506033182 0.07810453326
...
366.996887 625.413574 3.168393 2.114161 2.119713
366.997559 625.413391 3.163851 2.104703 2.117746
366.998261 625.461792 3.184296 2.113827 2.117964
366.998962 625.449463 3.163331 2.117869 2.116489
366.999664 625.510681 3.166895 2.126145 2.110077
这是我存储数据的文件的摘录。有没有办法使用日期时间库将这种格式转换为类似 2010-10-23 的格式?这里的年份是 2011 年,但数据中没有指定。
谢谢!
我查看了pandas的文档,虽然我不是很懂,但它起作用了。时间采用十进制格式,按天计算。所以我只是定义它并使用时间戳来声明我已经知道的年份。
df['Time'] = pd.to_datetime(
df['Time'], unit='D', origin=pd.Timestamp('2011-01-01')
)
有了这个,结果就是我想要的。并且历时366天,如下图:
Time A B C D
2016-01-02 00:00:30.003004800 626.996643 0.000000 0.000000 -99.989998
2016-01-02 00:01:29.998646400 626.996765 0.000000 0.000000 -99.989998
2016-01-02 00:02:30.004569600 627.013000 0.000000 0.000000 -99.989998
2016-01-02 00:03:30.000211200 627.013000 0.000000 0.000000 -99.989998
2016-01-02 00:04:29.995852800 627.045593 0.000000 0.000000 -99.989998
... ... ... ... ...
2017-01-01 23:55:31.054080000 625.413574 2.706322 2.086675 2.094654
2017-01-01 23:56:29.063040000 625.413391 2.738388 2.082261 2.092784
2017-01-01 23:57:29.707200000 625.461792 2.762815 2.097127 2.091273
2017-01-01 23:58:30.351360000 625.449463 2.698989 2.105750 2.090060
2017-01-01 23:59:30.995520000 625.510681 2.751848 2.109448 2.090664
您可以使用 pd.to_datetime():
将列转换为日期时间df.Time = pd.to_datetime(df.Time)
df.head(2)
Time A B C D
0 1970-01-01 00:00:01.000347257 626.996643 0 0 -99.989998
1 1970-01-01 00:00:01.001041651 626.996765 0 0 -99.989998
您的列 Time
似乎是日分数。如果您知道年份,则可以使用
# 1 - convert the year to nanoseconds since the epoch
# 2 - add the day fraction, after you convert that to nanoseconds as well
# 3 - convert the resulting nanoseconds since the epoch to datetime
year = '2011'
df['datetime'] = pd.to_datetime(pd.to_datetime(year).value + df['Time']*86400*1e9)
这会给你例如
df
Time A B C D datetime
0 1.000347 626.996643 0 0 -99.989998 2011-01-02 00:00:30.003004928
1 1.001042 626.996765 0 0 -99.989998 2011-01-02 00:01:29.998646272
2 1.001736 627.013000 0 0 -99.989998 2011-01-02 00:02:30.004569600
3 1.002431 627.013000 0 0 -99.989998 2011-01-02 00:03:30.000211200
4 1.003125 627.045593 0 0 -99.989998 2011-01-02 00:04:29.995852800
5 1.003819 627.061829 0 0 -99.989998 2011-01-02 00:05:30.001862400