Pandas 中的日期到持续时间

Dates to Durations in Pandas

我觉得这应该很容易完成,但我不知道怎么做。我有一个 pandas DataFramedate:

0    2012-08-21
1    2013-02-17
2    2013-02-18
3    2013-03-03
4    2013-03-04
Name: date, dtype: datetime64[ns]

我想要一列持续时间,例如:

0    0
1    80 days
2    1 day
3    15 days
4    1 day
Name: date, dtype: datetime64[ns]

我的尝试产生了一堆 0 天,NaT 相反:

>>> df.date[1:] - df.date[:-1]
0       NaT
1    0 days
2    0 days
...

有什么想法吗?

Timedeltas 在这里很有用:(see docs)

Starting in v0.15.0, we introduce a new scalar type Timedelta, which is a subclass of datetime.timedelta, and behaves in a similar manner, but allows compatibility with np.timedelta64 types as well as a host of custom representation, parsing, and attributes.

Timedeltas are differences in times, expressed in difference units, e.g. days, hours, minutes, seconds. They can be both positive and negative.

df

           0
0 2012-08-21
1 2013-02-17
2 2013-02-18
3 2013-03-03
4 2013-03-04

你可以:

pd.to_timedelta(df)

TimedeltaIndex(['0 days'], dtype='timedelta64[ns]', freq=None)
0      0
1    180
2      1
3     13
4      1
Name: 0, dtype: int64

或者,您可以使用 .shift()(或 .diff() 来计算时间点之间的差异,如@Andy Hayden 所示):

res = df-df.shift()

得到:

res.fillna(0)

         0
0   0 days
1 180 days
2   1 days
3  13 days
4   1 days

您可以使用以下方法将这些从 timedelta64 dtype 转换为 integer

res.fillna(0).squeeze().dt.days

0      0
1    180
2      1
3     13
4      1

您可以使用 diff:

In [11]: s
Out[11]:
0   2012-08-21
1   2013-02-17
2   2013-02-18
3   2013-03-03
4   2013-03-04
Name: date, dtype: datetime64[ns]

In [12]: s.diff()
Out[12]:
0        NaT
1   180 days
2     1 days
3    13 days
4     1 days
Name: date, dtype: timedelta64[ns]

In [13]: s.diff().fillna(0)
Out[13]:
0     0 days
1   180 days
2     1 days
3    13 days
4     1 days
Name: date, dtype: timedelta64[ns]

df.date[1:] - df.date[:-1] 并不像您想象的那样。每个元素都减去 series/dataframe 索引映射,而不是系列中的位置。

计算 df.date[1:] - df.date[:-1] 是:

+---- index of df.date[1:]
|                     +---- index of df.date[:-1]
|                     |
|                     v
v                     
                   -  0    2012-08-21    = NaT
1    2013-02-17    -  1    2013-02-17    = 0
2    2013-02-18    -  2    2013-02-18    = 0
3    2013-03-03    -  3    2013-03-03    = 0
4    2013-03-04    -                     = NaT